Manually trigger job on board in investigating state

Registered by Le Chi Thu

We will able to put a board in a new health state named 'investigating'. In this state we can manually trigger health check job or alternative job. The job can also be run in number of iteration with pause between run.

Blueprint information

Status:
Complete
Approver:
Fathi Boudra
Priority:
High
Drafter:
Le Chi Thu
Direction:
Approved
Assignee:
Le Chi Thu
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2012.05
Started by
Le Chi Thu
Completed by
Fathi Boudra

Whiteboard

On Wed, 21 Mar 2012 15:16:13 -0500, Paul Larson <email address hidden> wrote:
> On Wed, Mar 21, 2012 at 3:08 PM, Michael Hudson-Doyle <
> <email address hidden>> wrote:
>
> > On Wed, 21 Mar 2012 09:28:28 +0000, Dave Pigott <email address hidden>
> > wrote:
> > > > beaglexm03 - hardware problem ? network not working well.
> > >
> > > Might have a problem with the board. Some very odd serial behaviour
> > > which went away after a reset. Have re-enabled to see if it is a
> > > hardware problem. Next step will be to replace the sd card.
> >
> > I wonder if we should have a kind of 'sticky' or 'suspect' health status
> > we can stick a board into where it will just run health jobs over and
> > over -- if we put a board in this state for a few hours or a day it
> > might be useful to gauge how flaky it is?
> >
> I like it! would this be something we manually trigger, or would it be
> automatically put into that state? Would it rely on someone exiting it
> from that state?

I was thinking something entirely manual, i.e. it would be a manual
action to put a board into this state and a manual action to take it out
of it again.

We could also allow defining a health job on the device, I think we
talked about this when first adding health jobs but didn't see a need.
If Chi Thu wants to do this, it'll be easy enough to add too.

Cheers,

Meta:
Headline: Add looping of health check jobs
Acceptance: Health check jobs are running i a loop to stress test the device.
Roadmap Id: LAVA2012-LAVA-HEALTH-MANAGEMENT

(?)

Work Items

Work items:
Study the scheduler code base : DONE
Add investigation state to the Device state : DONE
Update device status web page for put the device in investigating mode : DONE

This blueprint contains Public information 
Everyone can see this information.