Improve Suspend and Hibernate Kernel Debug

Registered by Colin Ian King

Suspend S3 and Hibernate S4 debug is notoriously difficult and the pm-debug via the RTC only goes so far. S3/S4 debugging is tedious and time consuming, so it seems sensible to spend some effort in developing kernel debug that can help with debugging. We should develop more flexible debugging to enable us to get state out of machines when we don't have the luxury of serial console or JTAG.

Blueprint information

Status:
Complete
Approver:
Chris Van Hoof
Priority:
Medium
Drafter:
Colin Ian King
Direction:
Approved
Assignee:
Colin Ian King
Definition:
Approved
Series goal:
Accepted for oneiric
Implementation:
Implemented
Milestone target:
milestone icon ubuntu-11.10
Started by
Chris Van Hoof
Completed by
Colin Ian King

Related branches

Sprints

Whiteboard

There's a pstore filesystem recently introduced in the kernel that may make it possible to store crash dumps in a nonvolatile memory area. An introductory article is at http://lwn.net/Articles/434821/.

It might be worth considering the following patch https://lkml.org/lkml/2011/1/25/236, according to Linus, this will never make it to the upstream tree coz he feels the data on the HDD is more important that the kernel crash info, but the distros can work this into their kernel. Looks like an ugly bios hack, and not sure if it will play well with UEFI.

Problem: S3 & S4 Debug is NOT easy!

We need to do better at debugging these sorts of failure modes. We need
some new mechanisms to better find out what goes wrong.

We have classically used the RTC to debug the most difficult parts of
the process. Typically we need extra hardware to debug these such as
adding serial ports. We can also use 'port 80 cards'. This is not
accessible to the normal users.

- bios vendors use jtag and port 80 cards

Proposal is to use additional kernel driver to help record printk
information from the suspend/resume path.

Suggestion that we could use the BIOS EC as a new 'sensor' for this.
Also could use the Intel Management Engine though there is little access
to this.

Kdump is not usefull, due to the issue coming up early in the boot
process before kdump can come into play.

pcspeaker and leds
- wiggle a few bits and get state out
- hardware to detect state changes
  - video is slow 2-3 characters of state a second

We would need some hardware to record the lights/sound at a sensible
speed.
- have another pc record the audio and interpret it
  - leverage ham radio code already done to interpret sound
  - need kernel driver
      - have driver in kernel by default
      - add instrumentation as part of custom kernel? Or maybe in by
        default but disabled
      - apport can not collect this by default, tool that decodes it
        could
      - web service to decode audio text
      - attach audio to bug (large files), high bit rate mp3, flak
          - in lp detect, convert and delete
  - need to have in depth debug and instrumentation - can't be a module
    ksplice? systemtap? debug kernel ppa?

Tool to analyze and suggest where suspend/resume is failing to help
guide people through debug phase
- automate colin

ACPI pstore driver, only newer machines
- can utilize on machines that implement it

For machines that don't have speakers
- how many machines have pc speakers anymore? (Lots of newer machines
  don't)
- fall back to keyboard leds (caps lock etc)
- some machines don't have leds
- vga block cursor flash (back light usually comes on late)
    - can't do on uefi class 3 bios

Investigate video ddc pin

Deleted work items:
colin-king -> pre-test program to confirm what output devices they may have
   Assuming LED and PC speaker for the moment. Auto detection is a luxury
canonical-kernel-team --> look at providing switch for current suspend/resume output into new output driver
   No need for this, driver built into SystemTap solution

Work items for ubuntu-11.10:
[colin-king] Investigate ACPI + pstore for debug status saving: DONE
[colin-king] S3 early resume keyboard LED sanity check: DONE
[colin-king] suspend GPIO LED debug investigation, can we use these LEDs for debug (no): DONE
[colin-king] investigate LED reading hardware: DONE
[colin-king] prototype LED console output driver: DONE
[kamalmostafa] prototype sound output driver for inclusion in kernel in general: POSTPONED
[colin-king] review of percentage of machines which have pcspkr (~15-20%): DONE
[colin-king] review of percentage of laptop/netbook machines which have keyboard LEDs (~55%) : DONE
[colin-king] investigate how to add *lots* of instrumentation into S3/S4 debug kernel: DONE
[colin-king] write tool to examine instrumented kernel and give S3/S4 analysis: DONE
[sconklin] investigate video DDC pin: POSTPONED

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.