LAVA Scheduler (deprecated)

Force all LAVA devices to recheck health status at next opportunity

Registered by Paul Larson on 2012-03-08

When we upgrade the system, we want to make sure that things are behaving well and that there are no obvious problems introduced that are going to interfere with normal job progression. We can do that using the health checks.

If the health status is set to unknown before bringing the scheduler daemon back online, devices that are idle will start a health check immediately after the scheduler comes back up. If they were already processing a job during the update (which will be using the old pre-update code), it will complete, but the next job (using the new code just deployed) will be a health check job.

If the health check fails, the board will be marked offline and this will need to be investigated before allowing the board to continue processing normal jobs.

Blueprint information

Status:: Complete

Approver:: Paul Larson

Priority:: Medium

Drafter:: Paul Larson

Direction:: Approved

Assignee:: Michael Hudson-Doyle

Definition:: Approved

Series goal:: Accepted for trunk

Implementation:: Implemented

Milestone target:: 2012.03

Started by: Michael Hudson-Doyle on 2012-03-16

Completed by: Michael Hudson-Doyle on 2012-04-02

Related branches

Related bugs

Sprints

Whiteboard

Meta:
Headline: It is possible to force health checks to run on all boards after a deployment
Acceptance: there is a djano admin action to set set all devices health to unknown
Roadmap id: LAVA2012-LAVA-HEALTH-MANAGEMENT

(?)

Work Items

Work items:
add admin action to set all devices health to unknown: DONE

This blueprint contains Public information

Everyone can see this information.

Subscribers

Fathi Boudra

Linaro Validation Team