misc: make possible to use the range field of hrtimer into the clockevent driver

Registered by Vincent Guittot on 2011-01-17

The hrtimer can specify a range where it should fire but the clockevent driver only gets the max value of this time window. the clock event driver should take advantage of this range for synchronizing the linux wake up with the wake up of other part of the system

Blueprint information

Status:
Complete
Approver:
Vincent Guittot
Priority:
Medium
Drafter:
Amit Daniel Kachhap
Direction:
Approved
Assignee:
Dmitry Antipov
Definition:
Review
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2012.02
Started by
Amit Kucheria on 2012-02-27
Completed by
Mounir Bsaibes on 2012-03-06

Related branches

Sprints

Whiteboard

Hrtimer satisfies a wide range of requirements so optimizing it for powersaving can be an interesting task. Basically the clock event driver exposes the interface to schedule the actual next event interrupt. The time to schedule this interrupt is passed as a parameter by core timers interface after checking the other timers registered along with their ranges. This range field of Hrtimer is a important field in reducing the number of timers firing and thus less wakeus by merging the timers together.
http://lwn.net/Articles/299590

Some activity identified are,

* Use sleep_range api's instead of normal sleep in drivers and test applications(check possibility). More such type of timers will help in benchmarkings

* Use the range field of timers (Along with any synchronization needed) in the clock event driver itself before scheduling the next event interrupt.

* Benchmark and measure the C state idle wakeups and no of process wakeups.
A good way to test the benefit of adding a range parameter to the clock-event, is to check how this help to synchronize the wake-up/sleep sequence of the cores in the same cluster. The main benefit should be an increase on the cluster power down mode.
Let start on each core a thread, which uses hrtimer with range, to simulate periodic activity. The use a range in clock event should help to synchronize the wakeup of cores and to the improve the time spent in cluster off mode

Benchmark results obtained with oprofile on Panda board:

There is a sample kernel module to create a lot of kernel threads, each of them do schedule_timeout_uninterruptible() or schedule_hrtimeout_range():

https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc?action=AttachFile&do=view&target=timeoutbench.c

With default parameters (128 threads, 1 ms timeout) and jiffies-backed timeout, profiling result are here:

https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc?action=AttachFile&do=get&target=jif-128-1.log

For high-resolution timers-backed timeout, profiling results are here:

https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc?action=AttachFile&do=get&target=hrt-128-1.log

Results was obtained on the current Linus' tree. It's quite clear that high-resolution timers creates much more CPU-intensive workload than legacy jiffies-backed timers (despite on possible interrupt coalescing when the reasonable interval value is used). So, there is a strong suspect that converting a lot of code from jiffies-backed to high-res timers will increase accuracy/decrease latency at the cost of increasing CPU usage and so increasing power consumption.
Per Amit we should close this blueprint as implemented based on the conclusion reached by Dmitry.

(?)

Work Items

Work items for 2012.01:
[dmitry.antipov] implement core high-res timers feature to return time spent in usleep_range(): DONE
[dmitry.antipov] adapt OMAP I2C driver to use this feature: DONE
[dmitry.antipov] adapt SMSC95XX driver to use this feature: DONE
[dmitry.antipov] convert MMC/SD host driver to usleep_range(): DONE

Work items for 2012.02:
[dmitry.antipov] Push mmc, omap_i2c and smsc95xx changes to mainline: INPROGRESS

Work items for 2012.03:
[dmitry.antipov] Find and classify the kernel wide timers other then hrtimer subsystem which uses clock event driver to register the h/w timer: TODO
[dmitry.antipov] Implement a test case/benchmark to check whether high-res timers should be preferred over jiffies-backed timers in a different situations for a different kinds of workload: TODO
[dmitry.antipov] To create a preliminary design or even sample implementation of putting the range field in clock event driver: TODO
[dmitry.antipov] Get the above design reviewed before proceeding further: TODO
[dmitry.antipov] Comparisons between perf and oprofile results: TODO

This blueprint contains Public information 
Everyone can see this information.