Current documentation for “Linaro GCC”
To classify a blueprint as documentation, set the Implementation status to “Informational” When the blueprint's Definition status is marked “Approved”, it will appear in this listing.
Add support for GProf to AArch64 backend of GCC
for Linaro GCC
Add support for GProf to AArch64 backend of GCC
|
|
AArch64 bootstrap
for Linaro GCC
This blueprint is to cover the initial post-bootstrap work for AArch64 in GCC. Proposed topics include:
* IFUNC
* Stack protection support
* GProf support
* Beohm GC
See the dependent blueprints below for more.
|
|
Better end of loop counter optimisation
for Linaro GCC
GCC can calculate the final value of a loop counter and use that in later optimisations. The value is calculated in the original signed type which, due to overflow rules, can lead to a strange looking value which is not supported and reduced by later optimisations.
Improve either through reducing the calculated va...
|
|
Cortex-A15 Theme
for Linaro GCC
Meta Blueprint covering work scheduled for the Cortex-A15 Focus Iteration
|
|
Disable peeling
for Linaro GCC
Peeling aligns memory pointers at run time both to give a performance boost where the CPU is faster with aligned accesses, and to support CPUs that can only do aligned vector accesses.
ARM supports unaligned vector access for no penalty over aligned and a small penalty over aligned-with-alignment hints. As a first...
|
|
Investigate LRA in GCC for ARM
for Linaro GCC
LRA is a proposed replacement to reload in GCC. We should investigate whether it actually provides a benefit to x86 and x86_64, and if it does then work out what benefit turning it on will give to ARM, and the steps needed to turn it on.
|
|
Improve generation of conditional execution instructions
for Linaro GCC
Improve conditional execution code generation (cond-exec) especially for store-flag sequences. This is in the cases that we compare values with 0 or with other registers. It might be possible to improve these in certain cases to avoid conditional instructions and replace them with equivalent arithmetic instructions....
|
|
Check common programs for areas the vectoriser could improve
for Linaro GCC
There are a few libraries and programs out there that have hand-written NEON chunks. Look into these and see if the vectoriser can replace them or what is needs to be added.
|
|
Fix any NEON vs core regressions
for Linaro GCC
We tell people to enable -mfpu=neon by default. In some cases this code runs slower than the non-NEON code. Investigate and fix.
|
|
Improve SMS on code with memory dependencies
for Linaro GCC
Investigate whether SMS can use the rtl loop infrastructure to avoid unnecessary memory dependencies.
|
|
Track and investigate performance regression areas for GCC
for Linaro GCC
We would like to be able to track performance regressions along certain parameters in terms of GCC for Cortex A9.
When run on a Cortex A9, the following should be true:
* A9 vs A8: code tuned with -mtune=cortex-a9 should run faster than the same code tuned with -mtune=cortex-a8
* ARMv7 vs ARMv5: code built with -...
|
|
64 bit divide by constant
for Linaro GCC
GCC can convert a 32 bit divide by constant into the corresponding multiplies and shifts. Implement the same for 64 bit values.
PENDING: we found this but michaelh1 can't remember where.
|
|
AArch64 GCC support for Stack Protection
for Linaro GCC
Add support to libssp for AArch64
|
|
Backport conditional execution work
for Linaro GCC
Backport any conditional execution work done by ARM into GCC 4.5
|
|
Detect smin / umin idiom
for Linaro GCC
Detects and optimise idioms like:
#define min(x, y) ((x) <= (y)) ? (x) : (y)
unsigned int foo (unsigned int i, unsigned int x ,unsigned int y)
{
return i < (min (x, y));
}
int bar (int i, int x, int y)
{
return i < (min (x, y));
}
See https://code.launchpad.net/~ramana/gcc-linaro/47-smin-umin-idiom/+merge/10...
|
|
Equivalent opposite condition detection
for Linaro GCC
In some conditions the compiler generates a pair of conditional stores with the opposite condition codes. These could be folded into one unconditional store.
Seen in libav in vp8.e
|
|
Fix EPILOGUE_USES regression in CoreMark
for Linaro GCC
CoreMark regresses in Thumb-2 mode when using the LR regnum. Otherwise it's a good improvement. Investigate and fix. The idea is to use LR register as a general purpose register fit for use in a number of cases. The change proposed was the one upstream http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05706.h...
|
|
Generic tuning for all Cortex-A devices
for Linaro GCC
The TSC have asked for a GCC optimisation option that gives code that performs well across all of the Cortex-A series. Discuss what that means, how to balance across the targets, how to present it to the user, approach for implementing, and how to qualify the results. Is it worth discussing adding -mcpu=native at th...
|
|
Hot/cold partitioning in PGO
for Linaro GCC
Enable hot/cold partitioning when doing a profile guided optimisation build.
One feature of PGO is to see what code is hot and what is cold and then split this into different sections. This is difficult on ARM due to the constant pools. Implement.
|
|
Improve the auto increment/decrement pass
for Linaro GCC
Improve GCC's auto increment/decrement pass, with particular emphasis on NEON loads and stores.
Status:
- Requires a change to the Cortex A9 scheduling description that can only really be provided by ARM employees.
- Current patches posted here:
http://lists.linaro.org/pipermail/linaro-toolchain/2011-Decembe...
|
|
Improve constant pool support
for Linaro GCC
Investigate and improve the current constant pool generation and code.
The current constant pool placement code doesn't take profile info into consideration or whether something gets placed in the inner most kerrnel of a loop etc. There are a few cases when it does get placed in the middle of the inner most loop in...
|
|
Improve IV opts #1
for Linaro GCC
Finish the already upstream induction variable opts patch by backporting it.
|
|
Improve the register choice in the allocator
for Linaro GCC
Investigate the register allocator with respect to choice of Thumb1 vs Thumb2 instructions as discussed in the TSC commentary and write a proposal of what can be done (1MM).
|
|
Improve the Neon max and min intrinsics.
for Linaro GCC
The Neon max and min intrinsics could be represented as actual RTL (s/umax and s/umin ) for all types except for polynomial types rather than the current unspec form. In general they could also end up being folded into the GIMPLE form for max / min if possible.
|
|
Investigate performance of -funroll-loops, alone and in combination with -fvariable-expansion-in-unroller. Potentially tweak default parameters and/or implementation, also taking into account similar changes in the CodeSoucery toolchain. If it proves useful, work on enabling unrolling by default upstream.
|
|
Prefer movw movt over literal pools where possible.
for Linaro GCC
Look at https://bugs.launchpad.net/gcc-linaro/+bug/886124 for more information. There are a number of places where this shows up as an issue in the headroom analysis that was done with respect to benchmarks.
|
|
NEON instruction coverage
for Linaro GCC
Check the coverage of the NEON instruction set by the vectoriser and backend, including different operand types such as registers and constants.
There are two parts to vectorisation - detecting the patterns in the original code, and using those patterns in the backend. We know all of the operations that NEON imple...
|
|
Turn on Shrink Wrapping for Linaro 47 and upstream 48
for Linaro GCC
Shrink-wrapping is a feature that was enabled upstream in GCC 4.7 but not for the ARM port. This requires at the minimum the definition of a new backend pattern "simple_return" to enable this feature to work. However this requires co-ordination with the way in which the epilogue is being generated which is being rew...
|
|
Add multiply pipeline bypass
for Linaro GCC
The A9 NEON pipeline has a bypass which can make the result of a multiply (or MLA?) quickly available to a following MLA.
Describe this in the pipeline so that SMS can use it.
PENDING: Michael can't find the bypass in the A9 NEON TRM and doesn't know the right term for it.
|
|
<arm_neon.h>/intrinsics improvements
for Linaro GCC
Check for any outstanding upstream enhancement requests and implement.
Please fill out the summary when the work is started.
|
|
Improve CRC16
for Linaro GCC
A simple bitwise CRC16 like used in a popular embedded benchmark has a range of possible improvements we can do in the middle end and backend.
See the sandbox page at:
https://wiki.linaro.org/MichaelHope/Sandbox/CRC16
for more. The initial steps are to do a hand written version to see the optimum and investigate...
|
|
Improve vectoriser narrowing operations
for Linaro GCC
Investigate how effectively GCC uses the NEON narrowing arithmetic instructions. Implement, upstream, and backport.
|
|
Room for a private call
for Linaro GCC
Room booking for a private call.
|
|
ARMv5 saturating add/subract support
for Linaro GCC
Add support for the ARMv5 saturated math operations
|
|
Add ARMv6 SIMD support
for Linaro GCC
Add GCC support for the short-vector SIMD instructions that work on core registers.
|
|
Improve block memory operations by GCC
for Linaro GCC
This blueprint is for investigating and improving block memory operations like memset and memclr generated by GCC and for backporting improvements done for unaligned access from upstream.
|
|
Transform statics to locals
for Linaro GCC
A popular embedded benchmark uses a lot of function level static variables that can be transformed into local variables. The speed up is very significant.
Our 4.4 had a -fremove-local-statics option that did this. The original discussion is here:
http://lists.linaro.org/pipermail/linaro-toolchain/2010-July/00005...
|
37 blueprint(s) listed.