Support for Amazon's MapReduce in the Cloud

Registered by Duncan McGreggor

txAWS wants to provide a full cloud API for developers who need the benefits of async programming in their applications and/or scripts. Providing support for MapReduce (via Amazon's Hadoop) in txAWS is part of this effort.

Here are the basic steps as outlined by Amazon (edited extensively):

 * Develop your data processing application. Amazon Elastic MapReduce enables job flows to be developed. There is a Python sample application called "similarity", and this might be a good place to check out the workflow involved in using Amazon's mapreduce. Here's the URL:
  http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2274&categoryID=263

 * Upload your data and your processing application into Amazon S3. Amazon S3 provides reliable, scalable, easy-to-use storage for your input and output data.

 * Start an Amazon Elastic MapReduce “job flow” (using the txAWS API). You will need to choose the number and type of Amazon EC2 instances you want, specify the location of your data and/or application on Amazon S3 and start the flow.

 * Monitor the progress of your job flow(s) from the txAWS API. After the job flow is done, retrieve the output from Amazon S3.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Duncan McGreggor
Direction:
Needs approval
Assignee:
Duncan McGreggor
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
milestone icon 0.5

Whiteboard

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.