CPU Hot Plug Latency on Android

Registered by Zach Pfeffer

Why?
In order to test big.LITTLE MP we need to run the CPU hotplug latency test

Context?
big.LITTLE MP effort.

What gets produced?
The CPU hotplug latency test and the test's dependencies will be ported to Andorid

Where will the work get put?
android.git.linaro.org and all applicable Linaro builds

Blueprint information

Status:
Complete
Approver:
Zach Pfeffer
Priority:
High
Drafter:
None
Direction:
Approved
Assignee:
Naresh Kamboju
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2013.05
Started by
Amit Pundir
Completed by
Naresh Kamboju

Whiteboard

Notes:
[2012/8/27 pfefferz] Put notes here.
[2012/09/03 pundiramit] cyclictest ported to Android locally. Waiting for a baseline to give this script a test run.
[2012/09/04 pundiramit] sysbench ported to Android locally. Tested on b.L FM on local machine.
[2012/09/24 pundiramit] yet to be tested on relevant baselines.
[2012/09/24 pfefferz] Moving to 12.10, been blocked on kernel config integration.
[2012/10/2 pfefferz] Looks like there are some issues that onlythe ARM LT can sort out, but they are busy on other issues. Amit to follow up.
[2012/10/15 pundiramit] nkambo started a thread with me, dzin, amitk and Vincent. He's been routed to Tixy for further updates. He plans to get in touch with Tixy to sort it out.
[2012/10/15 pfefferz] No update on prioritization from dzin. With Tixy blocked we need to come up with a plan.
[2012/10/22 pfefferz] This is moving, kernel config is getting updated.
[2012/11/14 asac]: what's the status on this one? Amit?
[2012/11/26 pundiramit] Hotplug and Tracing configs enabled in linux-linaro-tracking. Some issues related to function tracing yet to be sorted out, BLOCKED on that. amitk, vincent sorting it out.
[2012/11/26 vishalbhoj] Moving the BP to 12.12 since the kernel patches for Hotplug events need to be ported to latest kernel by the KWG and then the scripts could be validated on our builds.
[2012/12/04 pundiramit] patches to fix broken function tracer on thumb builds have now landed in linux-linaro. Waiting for patches (from Naresh/Vincent) for test_cpu_hotplug_latency to use function tracer instead of hotplug events.
[2012/12/13 pundiramit] INPROGRESS https://bugs.launchpad.net/linaro-big.little.mp/+bug/1087149
[2012/12/17 pfefferz] Moving to 13.01 due to broken code. PMWG is fixing.
[2013/01/07 pundiramit] Quoting from ongoing email discussion, "dzin: Does this issue need to be followed up? Tixy: Well, function tracing is still broken..."
[2013/01/08 pundiramit] Vincent reported that he verified CPU hotplug script from Chander on TC2 with CPU Idle disabled due to hang issues.
[2013/01/09 pundiramit] AmitK asked Chander to look into why hotplug, cpuidle and ftrace aren't getting along.
[2013/01/21 pundiramit] Naresh to verify Chander's hotplug script on supported platforms.
[2013/01/25 pundiramit] nkambo verified Chander's script with some custom kernel config options which are not enabled by default in vexpress_bL_mp_defconfig including CPIDLE disabled.
[2013/01/25 pundiramit] amitk: We shouldn't mark this as done until we are able to reliably hotplug _with_ cpuidle enabled.
[2013/1/28 pfefferz] Vishal will follow up to close this guy out.

[2013/2/25 nkambo] Tixy (Jon Medhurst) (tixy) wrote on 2013-02-20: Re: [Bug 1068595] Re: Enabling function tracer kills the system on TC2 #23
On Tue, 2013-02-19 at 18:19 +0000, Dietmar Eggemann wrote:
> # cd /sys/kernel/debug/tracing
> # echo bL_cpu_power_down > set_ftrace_notrace
> # echo __sync_range >> set_ftrace_notrace
> # echo __bL_cpu_down >> set_ftrace_notrace
> # echo __bL_outbound_leave_critical >> set_ftrace_notrace
>
> Then you can use the function tracer.

That works for me too on today's ARM Landing Team tree. Was that list of
functions obtained using the method described in:
http://lkml.org/lkml/2013/2/18/414

Now we have to find out what the actual problem is. I'll spend some more
time looking at this. First stab in the dark, ftrace on functions which
are called when cache is disabled? The above seem to be the functions
used after cpu_proc_fin() is called in tc2_pm_down() ...

[2013/2/25 nkambo] Tixy (Jon Medhurst) (tixy) wrote on 2013-02-20: #24
dont-trace-when-caches-disabled.hack Edit (546 bytes, text/plain)
Yes, problem seems to be when tracing functions when the caches are disabled because tracer doesn't crash when the attached hack is applied. This makes the trace hook return immediately is caches are disabled. (This isn't a proper fix and there are other trace hooks which would need modifying even if it were.)

[2013/2/25 nkambo]
Dietmar Eggemann (dietmar-eggemann) wrote on 2013-02-20: #25
I didn't know about http://lkml.org/lkml/2013/2/18/414 , just saw it later. I used a Dstream debugger and every time the kernel hanged after echo function > current_tracer I checked unsigned long ip in trace_function() . After having found the 4 culprits mentioned above, function tracer started working for me.

[2013/2/25] the bugs linked to this blueprints still on going. the testing is depend on stable kernel having ftrace support. due to this reason this blueprint can't be closed for 13.02

[2013/3/18 nkambo]
The linked bugs in this blueprints still yet to be fixed.
once the above linked bugs are fixed. we can test these script and release the test results.

[2013/4/12 nkambo] bugs linked on this blueprints have been updated by me with latest test information and kernel crash log.

Enabling function tracer kills the system on TC2
    https://bugs.launchpad.net/linaro-landing-team-arm/+bug/1068595
missing trace on function "cpu_hotplug_done"
    https://bugs.launchpad.net/linaro-big-little-system/+bug/1087149
exynos : rcu stall with 3.8-rc2
    https://bugs.launchpad.net/linaro-power-kernel/+bug/1102347

[2013/4/24 nkambo] All the linked bugs are depended on a common fix which has solved this issue. I have tested latest vexpress-mp kernel and confirmed that this bug no more re-producible.

[bug comments]

cpu_hotplug_latency.sh use to crash kernel after applying patch [1].
I have tested stall.sh and cpu_hotplug_latency.sh test files and confirmed that no more kernel crash found on this current vexpress-mp kernel.

[tixy wrote]
The linaro kernel trees now have the files mcpm.c, dcscb.c and tc2_pm.c all excluded from tracing. This wasn't sufficient to fix the issue however after further debugging I found that adding notrace to cpu_init() prevents any lockups or crashes. I tested this by toggling current_tracer between 'nop' and 'function' in an endless loop whilst running the bbench+audio workbench test.

The patch has been posted to the linux-arm-kernel list:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/163983.html

Fixed in linux-linaro tree by: https://git.linaro.org/gitweb?p=kernel/linux-linaro-tracking.git;a=commit;h=3a58649068df52162ef63dd0f1c98d231f667b05

Build
http://snapshots.linaro.org/android/~linaro-android/vexpress-linaro-mp/266/

Linux kernel version:
Linux localhost 3.9.0-rc7-00198-g03d27d8 #1 SMP Fri Apr 19 06:05:40 UTC 2013 armv7l GNU/Linux

I have attached test log to this bug: cpu_hotplug_latency_test_bug_1068595.log

However, all the attached bugs are fixed now, it is time to unblock the blueprint.
I have to work on test case verdict. which will be addressed in 13.05 cycle.

[2013/5/08 nkambo] Function tracer creating trace.dat file contains huge trace data. which can be decoded by using kernelshark or powertop. Currently I am exploring the options to read it on target and add pass/fail verdict.

[2013/05/20 nkambo]
vincent wrote:
Regarding the cpu hotplug tests, we need to define several latency
thresholds based on current measurements (have you got some results ?)
and target value:
-a low latency threshold which should be reached in idle system (and
at the end all the time) otherwise the test is failed. I have in mind
30ms (for tc2)
-an acceptable latency value which should never be reached even in a
loaded system otherwise the loaded test are failed: i have in mind
500ms (for tc2)

We also need to check the load that must be generated by the script
during the tests.

Vincent comments needs to be addressed.

[2013/05/20 nkambo]
Pass/fail verdict created.
python script from host machine will use test_load.txt as input and parse the results and prints pass fail.
Test integration on to lava will be handled in below BP:
https://blueprints.launchpad.net/linaro-big-little-system/+spec/integrate-cpu-hotplug-latency-tests-lava

Meta:
Roadmap id: CARD-191
Headline: CPU hotplug latency has been ported to Android and is available in relevant baselines
Acceptance: CPU hotplug latency and its dependencies have been ported to Android and the code is available in relevant baselines

(?)

Work Items

Work items for 2013.01:
Figure out which baselines need these CPU hotplug latency (can be done anywhere): DONE
Integrate http://sysbench.sourceforge.net/ into all Android baselines: DONE
Integrate https://rt.wiki.kernel.org/index.php/Cyclictest into all Android baselines: DONE
Integrate https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/Hotplug?action=AttachFile&do=view&target=test_cpu_hotplug_latency.sh into private repository: DONE
[tixy] Fix conflicts in kernel configuration: DONE
[tixy] Fix broken function tracing on linux-linaro builds: DONE

Work items for 2013.02:
[chander-kashyap] Fix test script to use function tracer instead of hotplug events: DONE

Work items for 2013.05:
Add pass/fail verdict: DONE
Test in relevant builds: DONE
Integrate into LAVA: POSTPONED
Document manual test flow: POSTPONED

Dependency tree

* Blueprints in grey have been implemented.