misc: make possible to use the range field of hrtimer into the clockevent driver
The hrtimer can specify a range where it should fire but the clockevent driver only gets the max value of this time window. the clock event driver should take advantage of this range for synchronizing the linux wake up with the wake up of other part of the system
Blueprint information
- Status:
- Complete
- Approver:
- Vincent Guittot
- Priority:
- Medium
- Drafter:
- Amit Daniel Kachhap
- Direction:
- Approved
- Assignee:
- Dmitry Antipov
- Definition:
- Review
- Series goal:
- Accepted for trunk
- Implementation:
- Implemented
- Milestone target:
- 2012.02
- Started by
- Amit Kucheria
- Completed by
- Mounir Bsaibes
Related branches
Related bugs
Sprints
Whiteboard
Hrtimer satisfies a wide range of requirements so optimizing it for powersaving can be an interesting task. Basically the clock event driver exposes the interface to schedule the actual next event interrupt. The time to schedule this interrupt is passed as a parameter by core timers interface after checking the other timers registered along with their ranges. This range field of Hrtimer is a important field in reducing the number of timers firing and thus less wakeus by merging the timers together.
http://
Some activity identified are,
* Use sleep_range api's instead of normal sleep in drivers and test applications(check possibility). More such type of timers will help in benchmarkings
* Use the range field of timers (Along with any synchronization needed) in the clock event driver itself before scheduling the next event interrupt.
* Benchmark and measure the C state idle wakeups and no of process wakeups.
A good way to test the benefit of adding a range parameter to the clock-event, is to check how this help to synchronize the wake-up/sleep sequence of the cores in the same cluster. The main benefit should be an increase on the cluster power down mode.
Let start on each core a thread, which uses hrtimer with range, to simulate periodic activity. The use a range in clock event should help to synchronize the wakeup of cores and to the improve the time spent in cluster off mode
Benchmark results obtained with oprofile on Panda board:
There is a sample kernel module to create a lot of kernel threads, each of them do schedule_
With default parameters (128 threads, 1 ms timeout) and jiffies-backed timeout, profiling result are here:
For high-resolution timers-backed timeout, profiling results are here:
Results was obtained on the current Linus' tree. It's quite clear that high-resolution timers creates much more CPU-intensive workload than legacy jiffies-backed timers (despite on possible interrupt coalescing when the reasonable interval value is used). So, there is a strong suspect that converting a lot of code from jiffies-backed to high-res timers will increase accuracy/decrease latency at the cost of increasing CPU usage and so increasing power consumption.
Per Amit we should close this blueprint as implemented based on the conclusion reached by Dmitry.
Work Items
Work items for 2012.01:
[dmitry.antipov] implement core high-res timers feature to return time spent in usleep_range(): DONE
[dmitry.antipov] adapt OMAP I2C driver to use this feature: DONE
[dmitry.antipov] adapt SMSC95XX driver to use this feature: DONE
[dmitry.antipov] convert MMC/SD host driver to usleep_range(): DONE
Work items for 2012.02:
[dmitry.antipov] Push mmc, omap_i2c and smsc95xx changes to mainline: INPROGRESS
Work items for 2012.03:
[dmitry.antipov] Find and classify the kernel wide timers other then hrtimer subsystem which uses clock event driver to register the h/w timer: TODO
[dmitry.antipov] Implement a test case/benchmark to check whether high-res timers should be preferred over jiffies-backed timers in a different situations for a different kinds of workload: TODO
[dmitry.antipov] To create a preliminary design or even sample implementation of putting the range field in clock event driver: TODO
[dmitry.antipov] Get the above design reviewed before proceeding further: TODO
[dmitry.antipov] Comparisons between perf and oprofile results: TODO