Add streaming tags to mapreduce workflows

Registered by Trevor McKay

Oozie supports streaming mapreduce. Savanna should allow the streaming tag to be specified for mapreduce jobs.

This tag allows arbitrary scripts or executables to be specified as the mapper and reducer classes. The files specified must exist on the execution node, or they must be bundled in the /lib directory of the job or referenced in the <files> and <archives> tags (see the edp-oozie-files-and-archives blueprint)

Blueprint information

Status:
Complete
Approver:
Sergey Lukjanov
Priority:
Medium
Drafter:
Trevor McKay
Direction:
Approved
Assignee:
Trevor McKay
Definition:
Approved
Series goal:
Accepted for icehouse
Implementation:
Implemented
Milestone target:
milestone icon 2014.1
Started by
Sergey Lukjanov
Completed by
Sergey Lukjanov

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/edp-oozie-streaming-mapreduce,n,z

Addressed by: https://review.openstack.org/69035
    Allow boolean "streaming" in Job JSON

Addressed by: https://review.openstack.org/69477
    Add <streaming> tag generation to mapreduce workflow

Addressed by: https://review.openstack.org/69712
    Extract configs beginning with "savanna." from job_configs['configs']

Addressed by: https://review.openstack.org/69727
    Generate streaming tag in mapreduce job

Addressed by: https://review.openstack.org/69960
    Add validation check for streaming elements on MapReduce without libs

Addressed by: https://review.openstack.org/70829
    Add integration test for streaming mapreduce

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.