Improvements for Pulling Down Content and Generating Grammatical Data

Registered by Duncan McGreggor

There's a bunch of work that needs to be done to make webXcreta more performant. This includes, but is not limited to, the following:
 * adding new client code from the latest Twisted release
 * improving the code that scrapes Technorati for the top blogs
 * improving the code that hits all the sites and gets RSS feed links from them (async batching)
 * improving the code that pulls down RSS/Atom feeds (async batching)
 * do a better job of splitting up the jobs for natural language processing, whenever any content is pulled down
 * figure out a better/more efficient means of storing processed data
 * be more efficient about reading the stored data and using that to generate new, random content

Blueprint information

Status:
Not started
Approver:
Duncan McGreggor
Priority:
Medium
Drafter:
Duncan McGreggor
Direction:
Approved
Assignee:
Duncan McGreggor
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Unknown
Milestone target:
None

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.