pre-built image job reliability
LAVA has pre-built images submitted for testing each day like:
http://
According to our reports view:
http://
Our failure rate for successfully completing these test jobs ranges between 50% to about 65%. We should start to analyze these jobs to see common causes of failures in a similar way that we did for health job failures. After doing this type of investigation, we should hopefully be able to find the most common causes of failures and make adjustments to LAVA to help correct these issues.
Blueprint information
- Status:
- Complete
- Approver:
- None
- Priority:
- High
- Drafter:
- Andy Doan
- Direction:
- Approved
- Assignee:
- Spring Zhang
- Definition:
- Approved
- Series goal:
- Accepted for trunk
- Implementation:
- Implemented
- Milestone target:
- 2012.07
- Started by
- Spring Zhang
- Completed by
- Spring Zhang
Related branches
Related bugs
Bug #1019630: apt proxy failures | Fix Released |
Bug #1027906: OSError exception when atexit | Won't Fix |
Sprints
Whiteboard
[qzhang, 20120719] Result on https:/
[qzhang, 20120720] Origen 27 jobs: 25883~25908; Panda 25 jobs: 25495-25519, 15 nano and 10 leb; Snowball 25 jobs: 25858~25882
[qzhang, 20120723] Previous jobs are invalid for all failed on a updated dispatcher code, now re-run it. Origen 25 jobs: 26226~26251; Snowball 25 jobs: 26253~26277
[qzhang, 20120729] Convert wiki to SpreadSheet on https:/
Meta:
Headline: pre-built image testing improved
Acceptance: we have metrics (if not code fixes) to the most common LAVA failures for pre-built image testing
Roadmap id: CARD-128
Work Items
Work items:
pick a few daily builds of Origen and re-submit 25 jobs for each build: DONE
pick a few daily builds of Panda and re-submit 25 jobs for each build: DONE
pick a few daily builds of Snowball and re-submit 25 jobs for each build: DONE
create a spreadsheet of failures organized by "lava failure", "image failure", "don't know": DONE
write a summary of the most common problems we are seeing: DONE