Enable and tune vectorizer cost model on ARM

Registered by Ulrich Weigand

The vectorizer cost model is currently disabled by default. This may cause regressions (in some cases extreme, up to 20x slowdown has been reported) on certain test cases that *can* be vectorized, but only in a very inefficient manner.

However, enabling the cost model seems to expose regressions on some benchmarks, most likely because the back-end end cost factors have not been properly tuned for ARM.

This blueprint asks to:
- Run suite of benchmarks with cost model enabled and investigate regressions.
- Tune back-end cost factors to eliminate those regressions (as far as possible).
- Once tuning is complete, enable cost model on ARM by default.

Blueprint information

Michael Hope
Ulrich Weigand
Christophe Lyon
Series goal:
Accepted for 4.7
Milestone target:
milestone icon 4.7-2013.03
Started by
Matthew Gretton-Dann
Completed by
Matthew Gretton-Dann

Related branches



[matthew-gretton-dann 2013-03-20] This is now tracked in http://cards.linaro.org/browse/TCWG-8

This currently blocks https://launchpad.net/gcc-linaro/+spec/disable-peeling

A thread about disabling peeling for unaligned accesses starts here:
http://gcc.gnu.org/ml/gcc/2012-12/msg00036.html which transformed into a discussion about implementing the vectorizer cost model correctly to reflect the fact that unaligned loads/stores have no penalty over aligned ones.

Enabling the vectorizer cost model is achieved by using the option -fvect-cost-model, which has a default cost model.
The vectorizer cost model is now enabled by default at -O3 since http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00592.html. It can be disabled with -fno-vect-cost-model.

In this default cost model, unaligned loads/stores cost 2, while the aligned ones cost 1.

This leads to the following work items:
- run benchmarks with the default cost model enabled
- run benchmarks with the default cost model disabled
- tune the cost model (i.e. implement an ARM version), and benchmark

[christophe-lyon 2013-02-11]
* Benchmark results with and without the vectorizer cost model default implementation show little difference (Spec2k, popular embedded benchmark, coremark)
* I have implemented a new model where unaligned loads/stores cost 1.
* Benchmark results show little change (except for a 1.3% improvement in coremark)
* Patch proposed upstream mostly OK

[christophe-lyon 2013-02-12]
* Patch accepted upstream, and committed as svn rev#195977, for gcc-4.8.

Headline: Update GCC's ARM backend to use new vectorizer cost model infrastructure
Acceptance: Patch accepted upstream and backported to GCC Linaro
Roadmap id: CARD-304


Work Items

Work items:
Run benchmarks with default cost model enabled: DONE
Run benchmarks with default cost model disabled: DONE
Create an ARM cost model (aligned & unaligned accesses have the same cost): DONE
Benchmark ARM cost model: DONE
Send patch upstream and have it accepted: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.


No subscribers.