LAVA Dispatcher

pre-built image job reliability

Registered by Andy Doan on 2012-07-01

LAVA has pre-built images submitted for testing each day like:
http://validation.linaro.org/lava-server/dashboard/streams/private/team/linaro/pre-built-leb-origen

According to our reports view:
http://validation.linaro.org/lava-server/scheduler/reports

Our failure rate for successfully completing these test jobs ranges between 50% to about 65%. We should start to analyze these jobs to see common causes of failures in a similar way that we did for health job failures. After doing this type of investigation, we should hopefully be able to find the most common causes of failures and make adjustments to LAVA to help correct these issues.

Blueprint information

Status:: Complete

Approver:: None

Priority:: High

Drafter:: Andy Doan

Direction:: Approved

Assignee:: Spring Zhang

Definition:: Approved

Series goal:: Accepted for trunk

Implementation:: Implemented

Milestone target:: 2012.07

Started by: Spring Zhang on 2012-07-19

Completed by: Spring Zhang on 2012-07-27

Related branches

Related bugs

Bug #1019630: apt proxy failures	Fix Released
Bug #1027906: OSError exception when atexit	Won't Fix

Sprints

Whiteboard

[qzhang, 20120719] Result on https://wiki.linaro.org/Platform/Validation/PrebuiltImageReliability
[qzhang, 20120720] Origen 27 jobs: 25883~25908; Panda 25 jobs: 25495-25519, 15 nano and 10 leb; Snowball 25 jobs: 25858~25882
[qzhang, 20120723] Previous jobs are invalid for all failed on a updated dispatcher code, now re-run it. Origen 25 jobs: 26226~26251; Snowball 25 jobs: 26253~26277
[qzhang, 20120729] Convert wiki to SpreadSheet on https://docs.google.com/spreadsheet/ccc?key=0AqSRlHjy1cqjdDh5bXVoUkxWY01iZ3U5bEs2c0ZCbWc.

Meta:
Headline: pre-built image testing improved
Acceptance: we have metrics (if not code fixes) to the most common LAVA failures for pre-built image testing
Roadmap id: CARD-128

(?)

Work Items

Work items:
pick a few daily builds of Origen and re-submit 25 jobs for each build: DONE
pick a few daily builds of Panda and re-submit 25 jobs for each build: DONE
pick a few daily builds of Snowball and re-submit 25 jobs for each build: DONE
create a spreadsheet of failures organized by "lava failure", "image failure", "don't know": DONE
write a summary of the most common problems we are seeing: DONE

This blueprint contains Public information

Everyone can see this information.

Subscribers

Linaro Validation Team