[EDP] Data discovery component
EDP can have several sources of data for processing. Data can be pulled from Swift, GlusterFS or NoSQL database like Cassandra or HBase. To provide an unified access to this data we’ll introduce a component responsible for discovering data location and providing right configuration for Hadoop cluster.
This blueprint is about engine implementation, not all plugins required.
Supported Data Sources Types:
* HDFS
* HDFS on existing cluster
* NoSQL databases
* Distributed storages (Swift, Ceph, Gluster)
* RDBMS with Apache Sqoop
Data Source Types are introduced via plugin mechanism.
Data Source Object
* Category (NoSQL database, RDBMS, Distributed file storage, existing Hadoop cluster)
* Type
* Description
* Credentials
* Possible representations for example Hive table
* List of job using this data source
Method
* Registration data source
* Get credential for data source. Before getting credential plugin (responsible for this data type transforms) JSON object into the right format.
* Assign to job
* Detach from job
* Get list of available components
* Delete data source
Blueprint information
- Status:
- Complete
- Approver:
- Sergey Lukjanov
- Priority:
- High
- Drafter:
- Alexander Kuznetsov
- Direction:
- Approved
- Assignee:
- Alexander Kuznetsov
- Definition:
- Approved
- Series goal:
- Accepted for 0.3
- Implementation:
- Implemented
- Milestone target:
- 0.3
- Started by
- Sergey Lukjanov
- Completed by
- Sergey Lukjanov
Related branches
Related bugs
Sprints
Whiteboard
https:/
Gerrit topic: https:/
Addressed by: https:/
Added REST API for job and data source with simple validation
Work Items
Dependency tree
* Blueprints in grey have been implemented.