webXcreta

Improvements for Pulling Down Content and Generating Grammatical Data

Registered by Duncan McGreggor on 2012-03-15

There's a bunch of work that needs to be done to make webXcreta more performant. This includes, but is not limited to, the following:
* adding new client code from the latest Twisted release
* improving the code that scrapes Technorati for the top blogs
* improving the code that hits all the sites and gets RSS feed links from them (async batching)
* improving the code that pulls down RSS/Atom feeds (async batching)
* do a better job of splitting up the jobs for natural language processing, whenever any content is pulled down
* figure out a better/more efficient means of storing processed data
* be more efficient about reading the stored data and using that to generate new, random content

Blueprint information

Status:: Not started

Approver:: Duncan McGreggor

Priority:: Medium

Drafter:: Duncan McGreggor

Direction:: Approved

Assignee:: Duncan McGreggor

Definition:: Approved

Series goal:: Accepted for trunk

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Bug #955721: Re-architect Job Flow	New
Bug #956142: Use twisted.web.client.Agent instead of getPage	New

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.