Linaro GCC

Improve the Neon vector permute intrinsics.

Registered by Ramana Radhakrishnan on 2012-05-26

The Neon vector permute intrinsics namely

vtrn
vtbl
vtbx
vrev32/64/16
vzip
vuzp
vext

can be implemented using the generic __builtin_shuffle extension mechanism. This will also benefit and make sure that all tests in this area work properly.

This is something that came up only in 4.7 and should have a reasonably high priority as it can improve our vectorization numbers in quite a few cases and end up improving
quite a few cases in the intrinsics as we now have a way of telling the vectorizer and the generic optimizers that this really isn't a builtin_call but just a vector permute operation :)

Blueprint information

Status:: Complete

Approver:: Michael Hope

Priority:: Medium

Drafter:: Ramana Radhakrishnan

Direction:: Approved

Assignee:: Ramana Radhakrishnan

Definition:: Approved

Series goal:: Accepted for 4.7

Implementation:: Implemented

Milestone target:: None

Started by: Ramana Radhakrishnan on 2012-06-22

Completed by: Matthew Gretton-Dann on 2013-05-22

Related branches

Related bugs

Sprints

Whiteboard

[2013-05-22] This has been completed and is in FSF GCC 4.8

(?)

Work Items

Work items:
* Improve vrev64/32/16 , vzip, vuzp, vtrn using __builtin_shuffle Commit upstream: DONE
* Implement support for __builtin_shuffle in the C++ frontend : DONE
* Implement support for constexpr and __builtin_shuffle in C++ frontend : DONE
* Improve support for vdup intrinsics in the backend : INPROGRESS
* Improve neon intrinsics tests : INPROGRESS
* Improve support for vld3 intrinsics in the backend- Do not stack values to and from the stack gratuitously. Done by tweaking costs to allow lower-subreg only to lower values you want : INPROGRESS
* Validate improvements obtained : TODO
* Backport to Linaro 4.7: TODO

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.