Debugging failures in MAAS is hard, we should aim so that everything we ship with MAAS - i.e. outside of its core - should work through the API, even debugging tools. Customers - or PES - should not have to perform brain surgery on MAAS to diagnose a tagging issue, for example. Any solution we come up with has to work on site, in production.
Examples of failures:
* node fails to boot
* node fails to commission
* power scripts not working
* DNS not working
= Debugging MAAS =
== Ideas ==
- IPMI console (conserver), view it in the UI when requested
- Kernel param net console
- Better error messages
- Keep track of the major events in the lifecycle of a node
- Add a command line option ("debug mode") to "block" during enlistment/
- MaaSTest permanently in debug mode so you can always get a back door into the system.
- cloud-init logs on region/UI
- Consolidate logs into one file (→ (r)syslog)
- Log level changes on the fly (this needs a bug)
- Set software clock on enlistment
- Customize the DHCP template from the UI.
- Improved notifications UI.
- SOS report (in saucy main / https:/
IPMI console on demand, net-console by default (guesstimate 2w): TODO
Replace mod_wsgi as it doesn't support websockets (what with?): TODO
State management "threads" for each node and audit state history in DB (guesstimate 2w): TODO
Audit log UI / DB for a node (guesstimate 1w): TODO
Better error messages (review): TODO
Debug mode (backdoor, ipmi console on) (2w): TODO
Consolidate logs to syslog (tiered logging via syslog - store in postgres) (2w): TODO
Cloud-init logs in UI/API - redirect cloud-init logs and do a UI (2w): TODO
Log level changes on the fly ?????: TODO
Set software clock on enlistment/