Thermal Framework Design Architecture

Registered by Steve Jahnke on 2011-04-19

Based on (thm-framework-scope), create a design architecture for porting thermal framework to ARM based system. From sysfs, should look like any system using thermal framework (cooling devices assigned to temperature sensors). In Kernel, however, how best to port over on-die temp sensors (including DDR and silicon) to cooling devices, which is any part of the device that can scaling frequency or turn off hardware.

Ultimately, a decision has to be made as to how far to take thermal framework (if at all) on mapping on-die resources or just have the individual drivers do it all (and therefore no mapping) or some hybrid (CPU and maybe interconnect bus and GFX as a resource while memory-related items as a driver?).

Open questions:
1. Is there any firmware component required to shutdown the device in a critical state (the current "panic" state?). Would this be required to prevent any possible memory/device damage due to extreme thermal states? How is this handled today (driver has some role, but does it need any firmware part)?
Answer: There will always be a hardware component that does not rely on ANY apps processor software (kernel or user space) to catch a panic thermal condition and reset the system. The thermal framework will not need to worry about how to handle a panic state.

2. Should trustzone be part of thermal management to prevent tampering of of temp settings before boot - if so, can on-die temp sensors be accessible behind trustzone?

3. Need to research how the LPDDR2 drivers today modify DDR timings as if timings are modified, it is an area that security hardware/software looks at to see if a breach has occurred.
Answer: LPDDR2 drivers are done independently and outside the thermal-framework scope; thermal framework is intended to keep the SOC operating within designed parameters, not affect the timings of the DDR. This is handled in the LPDDR2 driver directly.

4. Do we consider separate implementations for "panic" type operations (where device damage can occur 'quickly') and normal thermal management where the device is expected to run occasionally out of spec, which itself needs management? Is there liability reasons on why we might want to do this?
Answer: Basically yes - panic is handled in hardware, while the thermal framework/"governor" must handle any run-time operating state.

5. To what level are we doing SOC thermal management vrs full system thermal management; do we expect that the end user of the device to have a separate solution and the focus of this project should just be device thermal management, with a means to signal a higher level (product-specific) thermal solution?
Answer: This is really intended as SOC specific thermal management, but can be easily scaled to incorporate the entire system, much like the existing thermal manager does today.

Hardware:
- If the on-die temp sensor generates an interrupt on a thermal event, this can be used to detect a critical thermal state during bootup, or how to handle 1) above; if the user-space manager dies, or if the kernel space (if used) is corrupted in some fashion, this interrupt can be used as the final fail-safe to detect a thermal event and shut the system down before damage occurs.

Note:
- The existing thermal framework does not take into account thermal panic conditions; it simply will turn the fan to the highest speed and it appears to expect the BIOS to take over if there is a true thermal over-limit condition. However, the hooks are in place to allow the thermal framework to make these decisions as well. We will need to decide if thermal framework will just monitor and try to control temperature (as it does today) and leave any overheating conditions to some other function (likely as how do we detect a thermal event during bootup, etc.) or try to do it all in
thermal framework.

Blueprint information

Status:
Complete
Approver:
Amit Kucheria
Priority:
High
Drafter:
Steve Jahnke
Direction:
Approved
Assignee:
Steve Jahnke
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2011.09
Started by
Amit Kucheria on 2011-06-03
Completed by
Mounir Bsaibes on 2011-09-16

Related branches

Sprints

Whiteboard

RESULTS
  See the above and the full specification

This work item was moved to https://blueprints.launchpad.net/linaro-pm-wg/+spec/block-diagram-for-kernel-thermal-framwork
[sjahnke] Create (system) block diagram of kernel thermal framework for ARM: TODO

(?)

Work Items

Work items:
[sjahnke] Research existing kernel thermal framework: DONE
[sjahnke] Define structures and API for ARM based kernel thermal framework: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.