sched: modify timer and workqueue framework to allow migration

Registered by Vincent Guittot

This Blueprint has been moved to JIRA: https://cards.linaro.org/browse/PMWG-236

- remove last use cases where they can be stuck on a CPU
- Being able to migrate a timer that was in running state during re-arm
- Create a preferred CPU function in the scheduler
- Use the preferred CPU function in workqueue framework

Related slides : http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf
Related Discussion in LPC: http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/

Blueprint information

Status:
Complete
Approver:
Amit Kucheria
Priority:
High
Drafter:
None
Direction:
Approved
Assignee:
viresh kumar
Definition:
Superseded
Series goal:
Accepted for trunk
Implementation:
Good progress
Milestone target:
milestone icon 2013.05
Started by
Amit Kucheria
Completed by
Serge Broslavsky

Related branches

Sprints

Whiteboard

Meta:
Headline: sched: modify timer and workqueue framework to allow migration to non-idle cpus
Acceptance: TBD

- Initial review comments came over the patchset. Need to see them closely after getting out of IKS activity.
[dzin, Jan 25, 2013] Has a code review been done?
[dzin, Jan 25, 2013] Please add Headline and Acceptance.
[vireshk, Jan 26, 2013] Got patches reviewed and with some feedback conclusion is to add a new family of API's like: queue_work_on_any_cpu(). Currently getting these implemented and tested. Also trying to get power numbers for migration/no migration cases to prove the concept. Currently stuck with some issue in mdelay() on TC2.

[vireshk, Feb 01, 2013] Got some discussion with Amit and Vincent on the test case and realized we need to work on some real time usecase. Looking at wakeup events on an idle system now. Got some training from Vincent on using kernel shark yesterday :)

[vireshk, Feb 04, 2013] Would be going to try mdelay() bug on panda board: INPROGRESS

[lorenzo-pieralisi, Feb 12, 2013] What version of patches got in the mainline for 2013.01 and what was missing or put it another way what is in v4 updates ?

[vireshk, Feb 12, 2013] Nothing went into the Linaro releases as this patchset isn't merged to master branch.

[lorenzo-pieralisi, Feb 112, 2013] asked for my own information since below the work item "Get patches reviewed and accepted in mainline: DONE" was marked as done.

[lorenzo-pieralisi, Feb 12, 2013] Another point: 1200 drivers found that would break with the new code, what's the plan to convert them ? is that done ?

[vireshk, Feb 12, 2013] The final conclusion was to get another set of APIs without disturbing earlier ones. So, we would be getting new APIs like: queue_work_on_any_cpu() and would then fix workqueue users one by one (only the ones we care about).

[vireshk, March 20, 2013]
- Posted V3 of this patchset. Initial comments are positive and we also saw improved power figures with it. https://lkml.org/lkml/2013/3/18/364
- Following patches were part of it:

df83ebf sched: Create sched_select_non_idle_cpu() to give preferred CPU for power saving
98a1ac8 timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target()
238c3d7 workqueue: Add helpers to schedule work on any cpu
821e796 PHYLIB: queue work on any cpu
e58b2bc mmc: queue work on any cpu
982859d block: queue work on any cpu
ed4fe9e fbcon: queue work on any cpu
807139e timer: Migrate running timer

- Would be waiting for review comments to complete before sending V4 with all the fixups and probably that might be included in mainline.

Timer migration patch was dropped and following was the reasons for it.
There are three issues with migrating running timers:
- del_timer_sync() might misinterpret completion of timer callback
- timer callbacks must be serialized
- Thomas Gleixner has Nacked it earlier due to: RT response time

We could figure out a solution to first problem but others were unsolved. Probably second one can be solved as well with some hacks but third one is a real problem :)

(?)

Work Items

Work items for 2012.09:
Study workqueue pinning to local CPU: DONE
Study CPU topology and sched domains in linux kernel: DONE
Implement sched_select_cpu() for selecting preferred CPU: DONE
Update workqueue code to use sched_select_cpu() to get preferred CPU: DONE
Create Module to test work queuing to different CPU's: DONE
Analyse, what's lacking in current approach: DONE
Send patches to kernel mailing lists: DONE
Modify timer framework to allow migration: DONE
Test timer migration with help of previous module created: DONE

Work items for 2013.01:
Get patches reviewed and accepted in mainline: DONE
Analyse users of workqueue subsystem to check which drivers would break by migrating work to other cpus, total of 1200 drivers found: DONE
Implement queue_work_on_any_cpu() type interfaces: DONE

Work items for 2013.03:
Generate power numbers to prove workqueue scheduling on non-idle cpu: DONE
Send next version V3 of patches: DONE
Waiting for review comments: DONE
Send V4 without creating sched_select_non_idle_cpu() routine: DONE
Get comments for it over mainline: DONE

Work items for 2013.05:
Send V5 with a new flag: WQ_UNBOUND_FOR_POWER_SAVE: DONE
Patches pushed/accepted by Tejun Heo: DONE

Work items for 2013.06:
Try to migrate running timer: INPROGRESS

This blueprint contains Public information 
Everyone can see this information.