External monitoring of LAVA uptime status and notification

Registered by Paul Larson on 2012-02-21

We would like to be notified whenever LAVA is down. This could be due to the server, infrastructure, connection or anything else. This would clearly require an external service that we can rely on.
As an example of this kind of service, we discussed looking into pingdom at the linaro connect

Blueprint information

Status:
Complete
Approver:
Paul Larson
Priority:
Medium
Drafter:
Paul Larson
Direction:
Approved
Assignee:
Paul Larson
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2012.03
Started by
Paul Larson on 2012-02-21
Completed by
Paul Larson on 2012-03-08

Related branches

Sprints

Whiteboard

Since this is just for our internal monitoring purposes, there should be no code change needed in lava, and nothing in the release itself.

[pwlars, 2012-03-05] Pingdom was the original tool suggested at the connect for checking out and I've been experimenting with it since then. It seems to work really well for our purposes, so we're going with that. If a better solution is found in the future, there's no reason why it couldn't be replaced, or even just added to the exising. Stats can already be publicly seen here: http://stats.pingdom.com/o15ezbsnv8tb

Meta:
Headline: N/A
Acceptance: When lava is offline, the team should receive an email. We should also have good metrics on the level of availability
Roadmap id: LAVA2012-LAB-MONITORING

(?)

Work Items

Work items:
Look into Pingdom: DONE
[fboudra] Get a mailing list for critical events: DONE
Look at other possibilities and decide on what we will use: DONE
Setup a solution and test that it works: DONE

This blueprint contains Public information 
Everyone can see this information.