Thermal management algorithm

Registered by Steve Jahnke

Add heuristic algorithm based on SOC characteristics; start with definition and then general implementation.

In the thermal framework (kernel thermal.c), there is a predictive algo used as a default for adjusting fan speed. It can be overridden. This algo is fairly aggressive, as it is simply adjusting fan speed in anticipation of heating up; there is no concern on user experience or performance impact by limiting operating points or entering idle, or any other SOC-only based technique that may be used. It is expected we will need a different default algorithm for non-fan based cooling solutions.

Blueprint information

Status:
Complete
Approver:
Amit Kucheria
Priority:
Medium
Drafter:
Steve Jahnke
Direction:
Approved
Assignee:
Steve Jahnke
Definition:
Review
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2011.06
Started by
Amit Kucheria
Completed by
Amit Kucheria

Related branches

Sprints

Whiteboard

RESULTS:
The governor is highly dependent upon both the specific SOC thermal characteristics and in the specific cooling mechanism used. By default, this cooling mechanism is reducing the top end speed of the SOC by operating point removal in the DVFS framework, but could also be achieved via Idle injection or in a number of other system-level features that may be present. In all cases, however, the following thermal zones are defined:

FATAL_ZONE: This zone indicates that the on-die temperature has reached a point where the device needs to be rebooted and allow ROM or the bootloader to run to allow the device to cool.

PANIC_ZONE: This zone indicates a near fatal temperature is approaching and should impart all necessary cooling agent to bring the temperature down to an acceptable level.

ALERT_ZONE: This zone indicates that die is at a level that may need more aggressive cooling agents to keep or lower the temperature.

MONITOR_ZONE: This zone is used as a monitoring zone and may or may not use cooling agents to hold the current temperature.

SAFE_ZONE: This zone is optimal thermal zone. It allows the device to run at max levels without imparting any cooling agent strategies.

NO_ACTION: Means just that. There was no action taken based on the current temperature sent in.

The code for OMAP based devices are present in the thermal tree under the governor sub-directory

(?)

Work Items

Work items:
[sjahnke] define basic algorithm in whiteboard: DONE
[sjahnke] implement basic algorithm to be used as a kernel default policy or in user-space: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.