[EDP] Data discovery component

Registered by Alexander Kuznetsov

EDP can have several sources of data for processing. Data can be pulled from Swift, GlusterFS or NoSQL database like Cassandra or HBase. To provide an unified access to this data we’ll introduce a component responsible for discovering data location and providing right configuration for Hadoop cluster.

This blueprint is about engine implementation, not all plugins required.

Supported Data Sources Types:
  * HDFS
  * HDFS on existing cluster
  * NoSQL databases
  * Distributed storages (Swift, Ceph, Gluster)
  * RDBMS with Apache Sqoop

Data Source Types are introduced via plugin mechanism.

Data Source Object
  * Category (NoSQL database, RDBMS, Distributed file storage, existing Hadoop cluster)
  * Type
  * Description
  * Credentials
  * Possible representations for example Hive table
  * List of job using this data source

Method
  * Registration data source
  * Get credential for data source. Before getting credential plugin (responsible for this data type transforms) JSON object into the right format.
  * Assign to job
  * Detach from job
  * Get list of available components
  * Delete data source

Blueprint information

Status:
Complete
Approver:
Sergey Lukjanov
Priority:
High
Drafter:
Alexander Kuznetsov
Direction:
Approved
Assignee:
Alexander Kuznetsov
Definition:
Approved
Series goal:
Accepted for 0.3
Implementation:
Implemented
Milestone target:
milestone icon 0.3
Started by
Sergey Lukjanov
Completed by
Sergey Lukjanov

Related branches

Sprints

Whiteboard

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.