Maverick Kernel Bug Processing Improvements

Registered by Jeremy Foshee on 2010-04-22

From the Lucid Blueprint: "It is still apparent that the incoming volume of kernel bugs remains problematic to manage. The ratio of incoming bugs to resources still doesn't scale. The goal of this spec is to re-evaluate our current bug management work flow and practices and determine a more effective way to manage kernel bugs."

The growing number of bug reports will only continue to increase as we pull in more features to the kernel. With KMS we have seen a significant increase in bugs for modesetting and we have seen an increase of DRM related bugs due to the backport into the .32 kernel. These will all need to be addressed. The goal of this particular spec is to continue improving our automated processes along with defining better processes to address these and other subsets of bug reports.

Blueprint information

Status:
Complete
Approver:
Pete Graner
Priority:
Essential
Drafter:
Jeremy Foshee
Direction:
Approved
Assignee:
Jeremy Foshee
Definition:
Approved
Series goal:
Accepted for maverick
Implementation:
Implemented
Milestone target:
milestone icon ubuntu-10.10
Started by
Pete Graner on 2010-05-27
Completed by
Jeremy Foshee on 2010-11-30

Related branches

Sprints

Whiteboard

Work Items ubuntu-10.10-beta:
[jeremyfoshee] Kernel Triage Summit: send heads up e-mail concerning 'triager summit':DONE
[jeremyfoshee] Kernel Triage Summit: plan dates for 'triager summit' based on SME input:DONE
[jeremyfoshee] Kernel Triage Summit: determine venue(#ubuntu-classroom?) for summit:DONE
[jeremyfoshee] Kernel Triage Summit: set schedule and communicate to all involved:DONE

Work items for ubuntu-10.10:
[leannogasawara] Lucid: apport/arsenal -- consolidate crash reports as we do for coredumps:POSTPONED
[apw] Lucid: kerneloops -- ensure ubuntu oopses are detected correctly:POSTPONED
[apw] Lucid: kerneloops -- move to bugs only coming through launchpad:POSTPONED
[apw] Lucid: c-o-d -- can we build some bisect points between releases:DONE
[jeremyfoshee] Lucid: documentation -- re-organize kernel team wiki pages:DONE
[jeremyfoshee] Duplicate bugs: work with forums moderators on duplicates in bugs:POSTPONED
[jeremyfoshee] Duplicate bugs: begin encouragement of bug commenters to file new bugs:DONE
[jeremyfoshee] Duplicate bugs: develop wiki documentation for reasoning on removal of duplicates:DONE
[jeremyfoshee] Duplicate bugs: add links to above document in arsenal scripts:POSTPONED
[jeremyfoshee] Duplicate bugs: add apport tag to suggest not duplicating:POSTPONED
[jeremyfoshee] Root cause analysis: work with forums moderators to identify trending topics:POSTPONED
[jeremyfoshee] Root cause analysis: work with SMEs to identify documentation shortcomings for triager identification of root causes: POSTPONED
[jeremyfoshee] Root cause analysis: work with Kernel Team to get documentation in order so that forums moderators have a landing zone for their forum users:DONE
[jeremyfoshee] Root cause analysis: work with forum moderators to begin pointing users to the 'landing zone' on the wiki:POSTPONED
[jeremyfoshee] Automated Bug Processing: improve arsenal scripts until we are processing all open statuses daily:POSTPONED
[jeremyfoshee] Automated Bug Processing: Improve documentation on why the automated processing is necessary:DONE
[jeremyfoshee] Automated Bug Processing: Add link to arsenal scripts to point users to the explanation pages:POSTPONED
[jeremyfoshee] Automated Bug Processing: work with the Kernel Team to come up with a further breakdown of kernel subsystems and tags:DONE
[brad-figg] Automated Bug Processing: change expiration time to 30 days:DONE
[jeremyfoshee] Automated Bug Processing: begin running SHA1 script to gather all bugs with an upstream commit:DONE
[jeremyfoshee] Automated Bug Processing: begin running patch processing script to identify and clear attachments that are marked as patches incorrectly:DONE
[jeremyfoshee] Automated Bug Processing: determine what is needed to identify bugs that can be 'carpetbombed' over the release cycle to test for fixed issues:POSTPONED
[canonical-kernel-team] Bug 614421 - update apport hook:POSTPONED

Kernel Process improvements for the M cycle
===========================================

Proposals:
 * 'triager summit'
   - getting together those with interest in triage for specific sub-systems time with area experts, with the consumers of their triage so they can learn from each other
   - 'kernel developer day', perhaps on a weekend day, or out of hours to aid community participation
   - target some specific areas, eg wifi, audio, etc
   - looking over specific bugs may be benficial
   - ensure we can get the already active triage community involved and schedule sensitive to them
 * removal of duplicate
   - having non-merged bugs often do have value for the kernel, multiple copies of logs
   - avoidance of dog-piling
   - duplication would be a triager option, not the end-user as they are not necessarily aware of the differences h/w can bring to root-cause of an issue
   - better documentation for triagers on how to truly determine a duplicate
   - spoke with the forums admins to help educate forum users by pointing them to the proper docs
   - forums tend to produce trending topics, try and get those topics to our attention
 * processing of all open statuses
   - it is not possible to hand process every bug, automation for the first contact is unortuanate necessity
 * Improvements to arsenal scripts
 * subsystem tagging

ACTIONS:
 * modify apport package hook to post a message suggesting not to mark the bug as a duplicate [JFo :)]
  - possibly SRU for it Lucid
  - general clean-up of linux package apport hook
  - don't ask if it is a regression but which release did this used to work in mdz - suggested "If you have confirmed that this problem is NOT present in an earlier version of Ubuntu, please select it from the list: [...]"
  - display kernel oops messages in the beginning rather than at the end (Bug 539896)
 * review the 'new bug' arsenal scripts
  - ensure they are hitting the most common cases in the appropriate way
 * decide on new length of expiration (30 days?) (arsenal script change)
  - can we expire development bugs quicker
  - can we carpet bomb all 'development' release bugs for every upload
   - "a new kernel was uploaded for maverick, changes are at this URL, could you retest"
 * identify bugs with sha1
  - team to help write up documentation on how to identify where they are coming from (bryceh has an arsenal script that will help here)
 * review bugs with patches
  - how to work out if the patches are good (bdmurray has a script (arsenal?) that may address this)

(?)

Work Items