Implement constant vec permute operation for the vext instruction

Registered by Ramana Radhakrishnan

The vext instruction can be supported with the constant vector permute infrastructure that GCC has in FSF 4.7 and above. While this infrastructure supports all the other constant vector permute operations it doesn't support this one.

Blueprint information

Status:
Complete
Approver:
Michael Hope
Priority:
Medium
Drafter:
Ramana Radhakrishnan
Direction:
Approved
Assignee:
Christophe Lyon
Definition:
Approved
Series goal:
Accepted for 4.7
Implementation:
Implemented
Milestone target:
None
Started by
Michael Hope
Completed by
Christophe Lyon

Sprints

Whiteboard

Background

GCC has generic vector permute support. This can be used by the programmer in 2 ways. Using the language extension that is __builtin_shuffle or expecting the auto-vectorizer to generate vector permutes.

If the optimizers can detect a vector permute that can be done with a "constant" vector i.e. something like the Neon vrev{16/32/64} operations which can be expressed as per the testcase in gcc.target/arm/neon-vrev.c. The currently supported constant permutes are vrev16, vrev32, vrev64, vzip, vunzip, vtrn. In case none of these can be supported we fall back to using the generic vector permute instruction which is vtbl and vtbx. The implementation in GCC will do the same for both the language extension and the vectorized case as long as the backend specifies that this is a supported constant vector permute operation.

However the vtbl and vtbx instructions are expensive as they only operate on vector registers and need the mask for the generic shuffle to be loaded into it. For instance look at the example posted here to show the difference in code generated for the vrev cases http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01793.html .

The aim of this task is to do the same for the permute operations that are allowed with the Neon vext instruction.

Thus at a broad level the tasks should be the following.

 * Understand the Neon vext instruction and generate testcases using the generic __builtin_shuffle mechanism.
 * Understand the backend implementation . Look at how the functions arm_evpc_neon_vuzp , arm_evpc_neon_vrev etc are used in arm.c

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.