How do we handle reverting packages in the devel release that introduce regressions?

Registered by Steve Langasek

We've talked about reverting uploads as a tool for correcting regressions in the development release. This practice has not been adopted as widely as it probably should be. Discuss the reasons and come up with clear policies on when reverting is the right answer (as well as when it isn't), so we can act on these policies quickly and with confidence, minimizing development velocity lost across the team.

Blueprint information

Status:
Started
Approver:
Steve Langasek
Priority:
Medium
Drafter:
Colin Watson
Direction:
Needs approval
Assignee:
Colin Watson
Definition:
Pending Approval
Series goal:
Accepted for raring
Implementation:
Started
Milestone target:
None
Started by
Colin Watson

Related branches

Sprints

Whiteboard

Premise: this cycle will see us putting in place a system that lands packages quickly in the development release with a lot of confidence that we work
 * we should be measuring developer satisfaction with this
 * desirable for the system to work quickly
 * if we turn the dial that lets things through quickly, we will occasionally be letting things through that are bad and break everyone's system

Open questions:
 - What are the limits on reverting? Make sure these are understood by everyone
 - Who should make the call that a revert needs to be considered? QA? (The developer working the bug may be too close to it to make this call directly?)
  - This needs to be somebody who understands the problem following root-cause analysis; at the very least we need to know what to revert (which is often non-obvious). --cjwatson
  [vorlon] the point I mean to express here is that somebody should be making a call that the impact of a particular bug warrants *considering* a revert. We obviously don't want to put ourselves in the situation of being expected to revert the world - though in some cases we may indeed want to revert a set of related uploads before we've pinned down exactly which change is to blame.
 - Should we document revert go/no-go discussions, so we can refine this policy in the future? (Not "incident reports" - something much more lightweight)

* examples:
 * a lightdm bug that broke things, they worked for 2 hours and then took the decision to revert
 * webkit 1.9 wasn't building on ARM, could have rolled back to webkit 1.8 but it would require revdeps to be rebuilt again.
  * this was a devel series that wasn't acceptable to ship per security team
  * "revert" means unwinding a lot of stuff
 * late in 12.10, bug in grub caused os-prober to fail, so booting windows would fail, it was a serious bug introduced in grub 2.0.
  * took a while to understand that it was a grub bug; we lived with it for a while.

We need to have a clear idea of *what* is to be reverted before trying to revert.

Criteria for doing a revert:
 * Sunset period for considering reversion: beyond some period of time (order of days, a week?), reversion takes you into a new state, rather than returning to a known-good state
 * high confidence that the reversion will fix it
 * in the developer's assessment, the revert will happen quickly enough that it may get through the buildds and published before a proper fix

(?)

Work Items

Work items:
[adconrad] Implement push-button reverting of single packages: TODO
[adconrad] Make push-button reverting find all binary rdeps and revert/rebuild those as well (this will need parsing versioned build-deps to decide if they need a revert rather than rebuild): TODO
[vorlon] Take to ubuntu-devel for discussion of how we trigger the consideration of a revert: TODO
[cjwatson] Establish lightweight log of revert discussions (https://wiki.ubuntu.com/UbuntuDevelopment/RevertLog): DONE