Degraded Hardware Notifications aka 'things you rather not see'

Registered by Dimitri John Ledkov on 2012-04-30

== Degraded Hardware Notifications ==

There are multiple ways hardware can degrade on the machine you are
using.

* degraded RAID
* 100% disk space usage
* 100% inode usage
* S.M.A.R.T. - failing hardware
* etc (?!)

== Current notifications ==

* are sometimes annoying (endless popups on desktop for SMART)
* non-existent AFAIK (inodes)
* not configured out the box (degraded RAID http://pad.lv/535417)

== Proposed notifications ==

Sometimes a user may not have the permissions to rectify the problem.
The notification should stay persistent, until fixed.
These notifications are important across all installation types.
This notifications should not replicate Nagios/Check_Mk monitoring.

* system indicators (Ubuntu Desktop)
* byobu plugin (Ubuntu Server)
* MOTD (Ubuntu Core)
* landscape notifications
* others ?! (Virtual Machines, cloud instance, etc)

== UDS session ==

* Discuss what notifications we would like to implement
* How/where/who will implement them

Blueprint information

Status:
Not started
Approver:
Colin Watson
Priority:
Low
Drafter:
Dimitri John Ledkov
Direction:
Approved
Assignee:
Dimitri John Ledkov
Definition:
Approved
Series goal:
Accepted for saucy
Implementation:
Not started
Milestone target:
None

Whiteboard

- jodh: In tandem with this idea, we should also consider updating the 'system-summary' (in the friendly-recovery package), along with adding new options to the recovery menu to allow common issues to be resolved. This is of particular importance if the user for whatever reason misses notifications, and on next reboot finds their machine is effectively inoperable (and thus unable to dispaly desktop notifications since booting to the desktop is no longer possible).
- jodh: need careful and repeatable (and ideally automated) tests for all possible failure scenarios.
- jodh: We should document expected behaviour for all possible failure scenarios. This is not only useful for users but also would aide development of tests.

Etherpad notes for processing:

 http://people.ubuntu.com/~dmitrij.ledkov/hardware-notification.png

=== Hardware criticality levels ===

High
    - Degraded RAID
    - 100% full filesystem (/usr /home /tmp /var /boot), Block errors on USB devices
    - SMART

Low (or just too dificult right now)
- RAM?
- fans
- CPU temps

Dmitrijs, please fill out the below section with an explanation of this feature that would be appropriate to include in the Ubuntu release notes.

Release Notes:

(?)

Work Items

Work items:
write design and architecture specification: POSTPONED
design monitoring daemon API: POSTPONED
implement monitoring daemon: POSTPONED
[mpt] design PC graphical client <https://wiki.ubuntu.com/DiskWarnings>: DONE
implement PC graphical client: BLOCKED
design MTA / email client (text / scope): POSTPONED
design Byobu notification indicator client: POSTPONED
implement byobu notification client: POSTPONED
design MOTD / Wall notification client: POSTPONED
implement MOTD / Wall notification client: POSTPONED
plan Landscape integration: POSTPONED
implement landscape integration: POSTPONED
design (possibly structured) log client: POSTPONED
implement log client: POSTPONED
plan/design integration with system-summary script in the friendly-recovery package: POSTPONED
plan/design integration with recovery boot option: POSTPONED
plan dpkg/debconf integration for critical warnings during dpkg run: POSTPONED
implement dpkg/debconf integration: POSTPONED
[mpt] design priority/text/notification types for Hard-disk out of space notification <https://wiki.ubuntu.com/DiskWarnings#full>: DONE
implement Hard-disk out of space notification: POSTPONED
provide test-case for Hard-disk out of space notification: POSTPONED
create pointers to solutions how to resolve Hard-disk out of space notification: POSTPONED
[mpt] design priority/text/notification types for Hard-disk out of inodes notification <https://wiki.ubuntu.com/DiskWarnings#inodes>: DONE
implement Hard-disk out of inodes notifications: POSTPONED
provide test-case for Hard-disk out of inodes notification: POSTPONED
create pointers to solutions how to resolve Hard-disk out of inodes: POSTPONED
[mpt] design priority/text/notification types for RAID failures <https://wiki.ubuntu.com/DiskWarnings#raid>: DONE
implement RAID failure notifications: POSTPONED
provide test-cases for RAID notifications: POSTPONED
[mpt] improve design priority/text/notification types for SMART failures <https://wiki.ubuntu.com/DiskWarnings#smart>: DONE
implement SMART failure notifications: POSTPONED
provide test-cases for SMART notifications: POSTPONED
implement ability to ignore notifications: POSTPONED
implement ability to customise notification channels: POSTPONED
create pointers to solution how to resolve RAID notifications: BLOCKED
improve identification of failed drives, symbolic->physical resolution: POSTPONED
create pointers to solution how to resolve RAID notifications: BLOCKED

Dependency tree

* Blueprints in grey have been implemented.