Improvements for Pulling Down Content and Generating Grammatical Data
Registered by
Duncan McGreggor
There's a bunch of work that needs to be done to make webXcreta more performant. This includes, but is not limited to, the following:
* adding new client code from the latest Twisted release
* improving the code that scrapes Technorati for the top blogs
* improving the code that hits all the sites and gets RSS feed links from them (async batching)
* improving the code that pulls down RSS/Atom feeds (async batching)
* do a better job of splitting up the jobs for natural language processing, whenever any content is pulled down
* figure out a better/more efficient means of storing processed data
* be more efficient about reading the stored data and using that to generate new, random content
Blueprint information
- Status:
- Not started
- Approver:
- Duncan McGreggor
- Priority:
- Medium
- Drafter:
- Duncan McGreggor
- Direction:
- Approved
- Assignee:
- Duncan McGreggor
- Definition:
- Approved
- Series goal:
- Accepted for trunk
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Bug #955721: Re-architect Job Flow | New |
Bug #956142: Use twisted.web.client.Agent instead of getPage | New |
Sprints
Whiteboard
(?)