Better archive crawler

Registered by Robbie Williamson

Speed up changelogs generation for changelogs.ubuntu.com. This is of particular importance for the users. Currently the changelogs are generated every 8h. This is not ideal so we should either use launchpadlib to query only for the latest changes and make it run much more often (e.g. 1h, 30min) or move this service to launchpad directly (but we need to ensure that its not hitting the DB too hard so pre-generated static files are a plus). The reference bug is #401043

Extraction of desktop and command-not-found data is currently out-of-scope and partly covered by the software-center-repository-based-metadata spec.

Blueprint information

Status:
Started
Approver:
None
Priority:
Low
Drafter:
Michael Vogt
Direction:
Approved
Assignee:
Canonical Foundations Team
Definition:
Approved
Series goal:
Accepted for lucid
Implementation:
Beta Available
Milestone target:
None
Started by
Michael Vogt

Related branches

Sprints

Whiteboard

Work items:
[mvo] start with changelogs: DONE
[mvo] port 3extract_changelogs from sh to launchpadlib/python: DONE
[mvo] ensure it does not need a full pool/ but instead uses getPublushedSources() from launchpadlib and simple http GET: DONE
[mvo] ensure it does provide pool for srcpkg and binary package links for the changelogs: DONE
[mvo] ensure it does not re-downloads binary debs for inspection when it inspected the same version on a different arch already: DONE
support populating the initial pool by inspecting the full archive once: POSTPONED
write python-apt based inspection tool that checks if all changelogs are present: TODO
[mvo] move away from rookery to a different machine with dpkg-source v3 forma and current launchpadlib: DONE
[mvo] deploy on changelogs.ubuntu.com (blocked): DONE

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.