Enable and tune vectorizer cost model on ARM

Registered by Ulrich Weigand on 2012-06-12

The vectorizer cost model is currently disabled by default. This may cause regressions (in some cases extreme, up to 20x slowdown has been reported) on certain test cases that *can* be vectorized, but only in a very inefficient manner.

However, enabling the cost model seems to expose regressions on some benchmarks, most likely because the back-end end cost factors have not been properly tuned for ARM.

This blueprint asks to:
- Run suite of benchmarks with cost model enabled and investigate regressions.
- Tune back-end cost factors to eliminate those regressions (as far as possible).
- Once tuning is complete, enable cost model on ARM by default.

Blueprint information

Michael Hope
Ulrich Weigand
Christophe Lyon
Series goal:
Accepted for 4.7
Milestone target:
milestone icon 4.7-2013.03
Started by
Matthew Gretton-Dann on 2013-02-03
Completed by
Matthew Gretton-Dann on 2013-03-14

Related branches



[matthew-gretton-dann 2013-03-20] This is now tracked in http://cards.linaro.org/browse/TCWG-8

This currently blocks https://launchpad.net/gcc-linaro/+spec/disable-peeling

A thread about disabling peeling for unaligned accesses starts here:
http://gcc.gnu.org/ml/gcc/2012-12/msg00036.html which transformed into a discussion about implementing the vectorizer cost model correctly to reflect the fact that unaligned loads/stores have no penalty over aligned ones.

Enabling the vectorizer cost model is achieved by using the option -fvect-cost-model, which has a default cost model.
The vectorizer cost model is now enabled by default at -O3 since http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00592.html. It can be disabled with -fno-vect-cost-model.

In this default cost model, unaligned loads/stores cost 2, while the aligned ones cost 1.

This leads to the following work items:
- run benchmarks with the default cost model enabled
- run benchmarks with the default cost model disabled
- tune the cost model (i.e. implement an ARM version), and benchmark

[christophe-lyon 2013-02-11]
* Benchmark results with and without the vectorizer cost model default implementation show little difference (Spec2k, popular embedded benchmark, coremark)
* I have implemented a new model where unaligned loads/stores cost 1.
* Benchmark results show little change (except for a 1.3% improvement in coremark)
* Patch proposed upstream mostly OK

[christophe-lyon 2013-02-12]
* Patch accepted upstream, and committed as svn rev#195977, for gcc-4.8.

Headline: Update GCC's ARM backend to use new vectorizer cost model infrastructure
Acceptance: Patch accepted upstream and backported to GCC Linaro
Roadmap id: CARD-304


Work Items

Work items:
Run benchmarks with default cost model enabled: DONE
Run benchmarks with default cost model disabled: DONE
Create an ARM cost model (aligned & unaligned accesses have the same cost): DONE
Benchmark ARM cost model: DONE
Send patch upstream and have it accepted: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.


No subscribers.