LAVA Scheduler (deprecated)

Monitoring Of Scheduler Queues

Registered by Andy Doan on 2012-07-01

Bug #1015532 shows a great example of place where we lack important monitoring in LAVA. We basically need to monitor the job queues in the scheduler to make sure jobs are being executed in a timely manner. We should check for things like:

* no devices of a given device type online
* jobs queues growing too large (and therefore taking too long to execute)
* jobs that seem to be hung

In the event these situations occur we should email an alert so the team is aware of the situation.

Blueprint information

Status:: Complete

Approver:: Andy Doan

Priority:: Medium

Drafter:: None

Direction:: Approved

Assignee:: Andy Doan

Definition:: Approved

Series goal:: Accepted for trunk

Implementation:: Implemented

Milestone target:: 2012.07

Started by: Andy Doan on 2012-07-25

Completed by: Andy Doan on 2012-07-25

Related branches

Related bugs

Sprints

Whiteboard

[2012-07-26]: I have this set up as a cronjob on control under /home/doanac/lava-scripts

Meta:
Headline: Monitoring added for LAVA job queues
Acceptance: Alerts will be raised when the scheduler is experiencing overloaded queues.
Roadmap id: CARD-128

(?)

Work Items

Work items:
create query to list when no devices of a given device type online: DONE
create query for jobs queues growing too large or having a job queued too long: DONE
create query to find jobs that seem to be hung: DONE
report this information: DONE

This blueprint contains Public information

Everyone can see this information.

Subscribers

Linaro Validation Team