Support for Amazon's MapReduce in the Cloud
txAWS wants to provide a full cloud API for developers who need the benefits of async programming in their applications and/or scripts. Providing support for MapReduce (via Amazon's Hadoop) in txAWS is part of this effort.
Here are the basic steps as outlined by Amazon (edited extensively):
* Develop your data processing application. Amazon Elastic MapReduce enables job flows to be developed. There is a Python sample application called "similarity", and this might be a good place to check out the workflow involved in using Amazon's mapreduce. Here's the URL:
http://
* Upload your data and your processing application into Amazon S3. Amazon S3 provides reliable, scalable, easy-to-use storage for your input and output data.
* Start an Amazon Elastic MapReduce “job flow” (using the txAWS API). You will need to choose the number and type of Amazon EC2 instances you want, specify the location of your data and/or application on Amazon S3 and start the flow.
* Monitor the progress of your job flow(s) from the txAWS API. After the job flow is done, retrieve the output from Amazon S3.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Duncan McGreggor
- Direction:
- Needs approval
- Assignee:
- Duncan McGreggor
- Definition:
- New
- Series goal:
- None
- Implementation:
-
Unknown
- Milestone target:
-
0.5
- Started by
- Completed by
Related branches
Related bugs
Bug #484428: Define the primary workflows for using Amazon MapReduce | New |
Bug #484469: Add Job Flow API support | New |
Sprints
Whiteboard
Work Items
Dependency tree

* Blueprints in grey have been implemented.