NEON performance for 11.11

Registered by Michael Hope

We have a list that covers the near term in vectoriser improvements. Discuss those that adapt existing vectoriser features to NEON, those that add new general purpose features exposed by the ARM investigation, and features that are ARM specific.

This blueprint contains tasks that don't warrent a blueprint themselves. To see the other topics that have been spun out, see:
 https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-neon-performance-11.11

Blueprint information

Status:
Complete
Approver:
Michael Hope
Priority:
High
Drafter:
Ira Rosen
Direction:
Needs approval
Assignee:
Ira Rosen
Definition:
Approved
Series goal:
Accepted for 4.6
Implementation:
Implemented
Milestone target:
None
Started by
Michael Hope
Completed by
Michael Hope

Related branches

Sprints

Whiteboard

The blocks have been split into blueprints to match the new style.

Below are the original blocks before splitting. Kept as blueprints
don't have history.

Doubling multiply:
Use NEON doubling multiply instructions: DONE
Implement, upstream, and backport: DONE

Over-promotion:
Reduce over-promotion in multiplication: DONE
Implement, upstream, and backport: DONE

Fix vectorizer testsuite failures upstream: DONE

Reduce over-promotion of vector operations that could be done with narrower elements: DONE
Implement, upstream, and backport: DONE
Use NEON widening shift left instruction: INPROGRESS
Implement, upstream, and backport: TODO
Change the default vector size for NEON to 128 bits: INPROGRESS
Implement, upstream: DONE
and backport: TODO

Investigate how effectively GCC uses the NEON narrowing arithmetic instructions: DONE
Implement, upstream, and backport: POSTPONED
Investigate excessive use of vmov instructions: TODO
Implement, upstream, and backport: TODO

Peeling:
Improve peeling heuristic in the vectorizer - without cost model: DONE
Implement, upstream, and backport: DONE
Investigate if peeling is effective for NEON both with and without cost model: TODO
Implement any improvements, upstream, and backport: TODO

arm_neon.h:
Check for any upstream enhancement requests: TODO
Check if further work is needed: TODO
Do round 1: TODO
Do round 2: TODO

Coverage:
Add tests for NEON instructions that can be directly expressed in C: INPROGRESS
Document the current vectoriser coverage: INPROGRESS
Document the current NEON backend coverage: TODO

See http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01863.html for an example of missed instructions and the test cases.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.