Current documentation for “Linaro Toolchain”

To classify a blueprint as documentation, set the Implementation status to “Informational” When the blueprint's Definition status is marked “Approved”, it will appear in this listing.

Add support for GProf to AArch64 backend of GCC
AArch64 bootstrap for Linaro GCC
This blueprint is to cover the initial post-bootstrap work for AArch64 in GCC. Proposed topics include:  * IFUNC  * Stack protection support  * GProf support  * Beohm GC See the dependent blueprints below for more.
Add EEMBC Network for Linaro Toolchain Benchmarks
We now have a license to EEMBC Network. Add to cbuild.
Add SPEC 2006 for Linaro Toolchain Benchmarks
Chris from LLVM prefers SPEC 2006. We don't run it currently as some of the tests require more than 1 GB of RAM and swapping in a test is bad. We have a license of SPEC 2006. Harness up similar to SPEC 2000, disable the excessive memory benchmarks, add to cbuild, and add to our regular runs.
GCC can calculate the final value of a loop counter and use that in later optimisations. The value is calculated in the original signed type which, due to overflow rules, can lead to a strange looking value which is not supported and reduced by later optimisations. Improve either through reducing the calculated va...
Cortex-A15 Theme for Linaro GCC
Meta Blueprint covering work scheduled for the Cortex-A15 Focus Iteration
Disable peeling for Linaro GCC
Peeling aligns memory pointers at run time both to give a performance boost where the CPU is faster with aligned accesses, and to support CPUs that can only do aligned vector accesses. ARM supports unaligned vector access for no penalty over aligned and a small penalty over aligned-with-alignment hints. As a first...
LRA is a proposed replacement to reload in GCC. We should investigate whether it actually provides a benefit to x86 and x86_64, and if it does then work out what benefit turning it on will give to ARM, and the steps needed to turn it on.
Improve conditional execution code generation (cond-exec) especially for store-flag sequences. This is in the cases that we compare values with 0 or with other registers. It might be possible to improve these in certain cases to avoid conditional instructions and replace them with equivalent arithmetic instructions....
ARM have submitted support for aarch64 to GDB trunk. This should be accepted and released as part of GDB 7.6 in six months time. It would be nice to have a stable backport in the interim. Investigate backporting support to Linaro GDB 7.5, the cost of keeping the backport up to date, and recommend what we should do.
There are a few libraries and programs out there that have hand-written NEON chunks. Look into these and see if the vectoriser can replace them or what is needs to be added.
String routines everywhere for Cortex String Routines
Take the string routines developed last cycle and make them available in GLIBC, Bionic, and Newlib. We have a set of string routines that as a package are a worthwhile improvement on Cortex-A devices. The next step is to get them everywhere such as GLIBC, Bionic, and Newlib. Discuss the target libcs, the legal issu...
We tell people to enable -mfpu=neon by default. In some cases this code runs slower than the non-NEON code. Investigate and fix.
Qualify the Arndale board for Linaro Toolchain Build Automation
We want to switch to the Cortex-A15 as our standard build and benchmark configuration. The Arndale board with the Exynos 5250 CPU is now available. Test the stability and performance to see about being our baseline.
Investigate whether SMS can use the rtl loop infrastructure to avoid unnecessary memory dependencies.
Switch tcpandas to armhf for Linaro Toolchain Build Automation
The tcpandas are currently armel based. Switch to match the ursas which are armhf. Includes updating to Precise and changing the kernel. Might be superseded by switching to LAVA.
We would like to be able to track performance regressions along certain parameters in terms of GCC for Cortex A9. When run on a Cortex A9, the following should be true: * A9 vs A8: code tuned with -mtune=cortex-a9 should run faster than the same code tuned with -mtune=cortex-a8 * ARMv7 vs ARMv5: code built with -...
Trunk reporting for Linaro Toolchain Benchmarks
We now build and benchmark trunk and the youngest release branch each week. Add a graph and automatic email on regressions.
Upstream crosstool-NG aarch64 support for Linaro Toolchain Binaries
We're holding the aarch64 support in the Linaro crosstool-NG branch. Finish the job by upstreaming. Note that upstream has recently added arbitrary version support for some libraries which can replace the hacks for random versions in config/.
64 bit divide by constant for Linaro GCC
GCC can convert a 32 bit divide by constant into the corresponding multiplies and shifts. Implement the same for 64 bit values. PENDING: we found this but michaelh1 can't remember where.
Add IFUNC support for AArch64 for Linaro Binutils
Add support for IFUNCs to the GNU Toolchain for AArch64.
Add support to libssp for AArch64
Add to Newlib for Cortex String Routines
Newlib is good as many other projects source from it. Make the string routines available there.
Backport any conditional execution work done by ARM into GCC 4.5
Detect smin / umin idiom for Linaro GCC
Detects and optimise idioms like: #define min(x, y) ((x) <= (y)) ? (x) : (y) unsigned int foo (unsigned int i, unsigned int x ,unsigned int y) {   return i < (min (x, y)); } int bar (int i, int x, int y) {   return i < (min (x, y)); } See https://code.launchpad.net/~ramana/gcc-linaro/47-smin-umin-idiom/+merge/10...
Optimization of x264 codec for ARM - parameters setup for Linaro Multimedia codec optimization
Community optimisation of x264 codec for ARM, initial study and exploitation of system parameters to implement real-time encoder for video conferencing.
In some conditions the compiler generates a pair of conditional stores with the opposite condition codes. These could be folded into one unconditional store. Seen in libav in vp8.e
CoreMark regresses in Thumb-2 mode when using the LR regnum. Otherwise it's a good improvement. Investigate and fix. The idea is to use LR register as a general purpose register fit for use in a number of cases. The change proposed was the one upstream http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05706.h...
Review the status of the GDB testsuite. Compare the results returned by x86_64-none-linux-gnu native, arm-none-linux-gnueabihf native and remote, arm-none-linux-gnueabi native and remote.
The TSC have asked for a GCC optimisation option that gives code that performs well across all of the Cortex-A series. Discuss what that means, how to balance across the targets, how to present it to the user, approach for implementing, and how to qualify the results. Is it worth discussing adding -mcpu=native at th...
Enable hot/cold partitioning when doing a profile guided optimisation build. One feature of PGO is to see what code is hot and what is cold and then split this into different sections. This is difficult on ARM due to the constant pools. Implement.
Improve GCC's auto increment/decrement pass, with particular emphasis on NEON loads and stores. Status: - Requires a change to the Cortex A9 scheduling description that can only really be provided by ARM employees. - Current patches posted here: http://lists.linaro.org/pipermail/linaro-toolchain/2011-Decembe...
Investigate and improve the current constant pool generation and code. The current constant pool placement code doesn't take profile info into consideration or whether something gets placed in the inner most kerrnel of a loop etc. There are a few cases when it does get placed in the middle of the inner most loop in...
Improve IV opts #1 for Linaro GCC
Finish the already upstream induction variable opts patch by backporting it.
Investigate the register allocator with respect to choice of Thumb1 vs Thumb2 instructions as discussed in the TSC commentary and write a proposal of what can be done (1MM).
The Neon max and min intrinsics could be represented as actual RTL (s/umax and s/umin ) for all types except for polynomial types rather than the current unspec form. In general they could also end up being folded into the GIMPLE form for max / min if possible.
Investigate performance of -funroll-loops, alone and in combination with -fvariable-expansion-in-unroller. Potentially tweak default parameters and/or implementation, also taking into account similar changes in the CodeSoucery toolchain. If it proves useful, work on enabling unrolling by default upstream.
memcpy in memmove for Cortex String Routines
Maxim is/has adding support to glibc for calling memcpy() from the forward or backward direction in memmmove(). Enable on ARM.
Look at https://bugs.launchpad.net/gcc-linaro/+bug/886124 for more information. There are a number of places where this shows up as an issue in the headroom analysis that was done with respect to benchmarks.
NEON instruction coverage for Linaro GCC
Check the coverage of the NEON instruction set by the vectoriser and backend, including different operand types such as registers and constants. There are two parts to vectorisation - detecting the patterns in the original code, and using those patterns in the backend. We know all of the operations that NEON imple...
Report benchmark results in LAVA for Linaro Toolchain Benchmarks
Use LAVA for reporting and visualization of toolchain benchmarks results. This is the first step to using LAVA for automating the toolchain benchmarks. Use the available reporting tools/API:s in LAVA for storing and visualizing benchmark results. All other steps for building, running and extracting results will be ...
Shrink-wrapping is a feature that was enabled upstream in GCC 4.7 but not for the ARM port. This requires at the minimum the definition of a new backend pattern "simple_return" to enable this feature to work. However this requires co-ordination with the way in which the epilogue is being generated which is being rew...
Add GDB and QEMU for meta-linaro
Add Linaro GDB and QEMU into meta-linaro. Ensure Linaro QEMU is used for all QEMU tasks in the system. Look at the difference in configuration and patches between OpenEmbedded and Ubuntu. Ideally there should be a 'best' build which is the maximum of the two. Update and share.
The A9 NEON pipeline has a bypass which can make the result of a multiply (or MLA?) quickly available to a following MLA. Describe this in the pipeline so that SMS can use it. PENDING: Michael can't find the bypass in the A9 NEON TRM and doesn't know the right term for it.
Check for any outstanding upstream enhancement requests and implement. Please fill out the summary when the work is started.
Firefox as a toolchain benchmark for Linaro Toolchain Benchmarks
Set up a working cross-compilation environment for Firefox. Do a first benchmark. Present the results to the group and evaluate if the results/findings are useful. If Firefox is too hard for cross-compiling, then switch to a WebKit based browser: Chromium or QtWebKit. Investigate if Mozilla can share their test con...
Fast tracepoints for Linaro GDB
Add tracepoint and then fast tracepoint support to GDB server.
GDB non-stop debugging for Linaro GDB
Non-stop debugging allows one thread to be debugged while others continue in the background.
Improve CRC16 for Linaro GCC
A simple bitwise CRC16 like used in a popular embedded benchmark has a range of possible improvements we can do in the middle end and backend. See the sandbox page at:  https://wiki.linaro.org/MichaelHope/Sandbox/CRC16 for more. The initial steps are to do a hand written version to see the optimum and investigate...
Improve libgmp for Linaro Toolchain Miscellanies
GMP is a multi-precision math library used in GCC, guile, python-crypto, gch, maxima, darcs, and other places. It currently as an ARMv4 backend. Invetigate the use of GMP, the percentage time spent in it, and any potential gains.
Investigate how effectively GCC uses the NEON narrowing arithmetic instructions. Implement, upstream, and backport.
Room for a private call for Linaro GCC
Room booking for a private call.
Add support for the ARMv5 saturated math operations
Add ARMv6 SIMD support for Linaro GCC
Add GCC support for the short-vector SIMD instructions that work on core registers.
Atom comparison for Linaro Toolchain Miscellanies
Benchmark ARM GCC against Atom GCC and report
This blueprint is for investigating and improving block memory operations like memset and memclr generated by GCC and for backporting improvements done for unaligned access from upstream.
Benchmarking the toolchain for Linaro Toolchain Benchmarks
We do benchmarks internally using a mix of custom scripts and a mix of boards. This should be cleaned up and taken over by the Infrastructure group. The runs should be mainly automatic, continuous, archived, and have basic daily reporting. Discuss what to benchmark, the platforms to run on, how to ensure consistency...
Add Cortex-A9 with SMP support to OpenOCD for Linaro Toolchain Miscellanies
OpenOCD is a JTAG based debugger. It currently has decent Cortex-A8 on OMAP3530 support. Add support for the A9, a selection of A9 based boards, and SMP.
Profiler backtracing for Linaro Toolchain Miscellanies
Demonstrate support for good backtracing in a profiling tool
A popular embedded benchmark uses a lot of function level static variables that can be transformed into local variables. The speed up is very significant. Our 4.4 had a -fremove-local-statics option that did this. The original discussion is here:  http://lists.linaro.org/pipermail/linaro-toolchain/2010-July/00005...
CoreSight STM Support for Linaro Toolchain Miscellanies
Add support for the CoreSight System Trace Macrocell.

61 blueprint(s) listed.