Current documentation for “Linaro GCC”

To classify a blueprint as documentation, set the Implementation status to “Informational” When the blueprint's Definition status is marked “Approved”, it will appear in this listing.

Add support for GProf to AArch64 backend of GCC
AArch64 bootstrap for Linaro GCC
This blueprint is to cover the initial post-bootstrap work for AArch64 in GCC. Proposed topics include:  * IFUNC  * Stack protection support  * GProf support  * Beohm GC See the dependent blueprints below for more.
GCC can calculate the final value of a loop counter and use that in later optimisations. The value is calculated in the original signed type which, due to overflow rules, can lead to a strange looking value which is not supported and reduced by later optimisations. Improve either through reducing the calculated va...
Cortex-A15 Theme for Linaro GCC
Meta Blueprint covering work scheduled for the Cortex-A15 Focus Iteration
Disable peeling for Linaro GCC
Peeling aligns memory pointers at run time both to give a performance boost where the CPU is faster with aligned accesses, and to support CPUs that can only do aligned vector accesses. ARM supports unaligned vector access for no penalty over aligned and a small penalty over aligned-with-alignment hints. As a first...
LRA is a proposed replacement to reload in GCC. We should investigate whether it actually provides a benefit to x86 and x86_64, and if it does then work out what benefit turning it on will give to ARM, and the steps needed to turn it on.
Improve conditional execution code generation (cond-exec) especially for store-flag sequences. This is in the cases that we compare values with 0 or with other registers. It might be possible to improve these in certain cases to avoid conditional instructions and replace them with equivalent arithmetic instructions....
There are a few libraries and programs out there that have hand-written NEON chunks. Look into these and see if the vectoriser can replace them or what is needs to be added.
We tell people to enable -mfpu=neon by default. In some cases this code runs slower than the non-NEON code. Investigate and fix.
Investigate whether SMS can use the rtl loop infrastructure to avoid unnecessary memory dependencies.
We would like to be able to track performance regressions along certain parameters in terms of GCC for Cortex A9. When run on a Cortex A9, the following should be true: * A9 vs A8: code tuned with -mtune=cortex-a9 should run faster than the same code tuned with -mtune=cortex-a8 * ARMv7 vs ARMv5: code built with -...
64 bit divide by constant for Linaro GCC
GCC can convert a 32 bit divide by constant into the corresponding multiplies and shifts. Implement the same for 64 bit values. PENDING: we found this but michaelh1 can't remember where.
Add support to libssp for AArch64
Backport any conditional execution work done by ARM into GCC 4.5
Detect smin / umin idiom for Linaro GCC
Detects and optimise idioms like: #define min(x, y) ((x) <= (y)) ? (x) : (y) unsigned int foo (unsigned int i, unsigned int x ,unsigned int y) {   return i < (min (x, y)); } int bar (int i, int x, int y) {   return i < (min (x, y)); } See
In some conditions the compiler generates a pair of conditional stores with the opposite condition codes. These could be folded into one unconditional store. Seen in libav in vp8.e
CoreMark regresses in Thumb-2 mode when using the LR regnum. Otherwise it's a good improvement. Investigate and fix. The idea is to use LR register as a general purpose register fit for use in a number of cases. The change proposed was the one upstream
The TSC have asked for a GCC optimisation option that gives code that performs well across all of the Cortex-A series. Discuss what that means, how to balance across the targets, how to present it to the user, approach for implementing, and how to qualify the results. Is it worth discussing adding -mcpu=native at th...
Enable hot/cold partitioning when doing a profile guided optimisation build. One feature of PGO is to see what code is hot and what is cold and then split this into different sections. This is difficult on ARM due to the constant pools. Implement.
Improve GCC's auto increment/decrement pass, with particular emphasis on NEON loads and stores. Status: - Requires a change to the Cortex A9 scheduling description that can only really be provided by ARM employees. - Current patches posted here:
Investigate and improve the current constant pool generation and code. The current constant pool placement code doesn't take profile info into consideration or whether something gets placed in the inner most kerrnel of a loop etc. There are a few cases when it does get placed in the middle of the inner most loop in...
Improve IV opts #1 for Linaro GCC
Finish the already upstream induction variable opts patch by backporting it.
Investigate the register allocator with respect to choice of Thumb1 vs Thumb2 instructions as discussed in the TSC commentary and write a proposal of what can be done (1MM).
The Neon max and min intrinsics could be represented as actual RTL (s/umax and s/umin ) for all types except for polynomial types rather than the current unspec form. In general they could also end up being folded into the GIMPLE form for max / min if possible.
Investigate performance of -funroll-loops, alone and in combination with -fvariable-expansion-in-unroller. Potentially tweak default parameters and/or implementation, also taking into account similar changes in the CodeSoucery toolchain. If it proves useful, work on enabling unrolling by default upstream.
Look at for more information. There are a number of places where this shows up as an issue in the headroom analysis that was done with respect to benchmarks.
NEON instruction coverage for Linaro GCC
Check the coverage of the NEON instruction set by the vectoriser and backend, including different operand types such as registers and constants. There are two parts to vectorisation - detecting the patterns in the original code, and using those patterns in the backend. We know all of the operations that NEON imple...
Shrink-wrapping is a feature that was enabled upstream in GCC 4.7 but not for the ARM port. This requires at the minimum the definition of a new backend pattern "simple_return" to enable this feature to work. However this requires co-ordination with the way in which the epilogue is being generated which is being rew...
The A9 NEON pipeline has a bypass which can make the result of a multiply (or MLA?) quickly available to a following MLA. Describe this in the pipeline so that SMS can use it. PENDING: Michael can't find the bypass in the A9 NEON TRM and doesn't know the right term for it.
Check for any outstanding upstream enhancement requests and implement. Please fill out the summary when the work is started.
Improve CRC16 for Linaro GCC
A simple bitwise CRC16 like used in a popular embedded benchmark has a range of possible improvements we can do in the middle end and backend. See the sandbox page at: for more. The initial steps are to do a hand written version to see the optimum and investigate...
Investigate how effectively GCC uses the NEON narrowing arithmetic instructions. Implement, upstream, and backport.
Room for a private call for Linaro GCC
Room booking for a private call.
Add support for the ARMv5 saturated math operations
Add ARMv6 SIMD support for Linaro GCC
Add GCC support for the short-vector SIMD instructions that work on core registers.
This blueprint is for investigating and improving block memory operations like memset and memclr generated by GCC and for backporting improvements done for unaligned access from upstream.
A popular embedded benchmark uses a lot of function level static variables that can be transformed into local variables. The speed up is very significant. Our 4.4 had a -fremove-local-statics option that did this. The original discussion is here:

37 blueprint(s) listed.