Force all LAVA devices to recheck health status at next opportunity
When we upgrade the system, we want to make sure that things are behaving well and that there are no obvious problems introduced that are going to interfere with normal job progression. We can do that using the health checks.
If the health status is set to unknown before bringing the scheduler daemon back online, devices that are idle will start a health check immediately after the scheduler comes back up. If they were already processing a job during the update (which will be using the old pre-update code), it will complete, but the next job (using the new code just deployed) will be a health check job.
If the health check fails, the board will be marked offline and this will need to be investigated before allowing the board to continue processing normal jobs.
Blueprint information
- Status:
- Complete
- Approver:
- Paul Larson
- Priority:
- Medium
- Drafter:
- Paul Larson
- Direction:
- Approved
- Assignee:
- Michael Hudson-Doyle
- Definition:
- Approved
- Series goal:
- Accepted for trunk
- Implementation:
- Implemented
- Milestone target:
- 2012.03
- Started by
- Michael Hudson-Doyle
- Completed by
- Michael Hudson-Doyle
Related branches
Related bugs
Sprints
Whiteboard
Meta:
Headline: It is possible to force health checks to run on all boards after a deployment
Acceptance: there is a djano admin action to set set all devices health to unknown
Roadmap id: LAVA2012-
Work Items
Work items:
add admin action to set all devices health to unknown: DONE