Linaro GCC

Implement constant vec permute operation for the vext instruction

Registered by Ramana Radhakrishnan on 2012-06-12

The vext instruction can be supported with the constant vector permute infrastructure that GCC has in FSF 4.7 and above. While this infrastructure supports all the other constant vector permute operations it doesn't support this one.

Blueprint information

Status:: Complete

Approver:: Michael Hope

Priority:: Medium

Drafter:: Ramana Radhakrishnan

Direction:: Approved

Assignee:: Christophe Lyon

Definition:: Approved

Series goal:: Accepted for 4.7

Implementation:: Implemented

Milestone target:: None

Started by: Michael Hope on 2012-08-22

Completed by: Christophe Lyon on 2012-09-05

Related branches

lp:~christophe-lyon/gcc-linaro/gcc-4.7-vec-permute-vext

Related bugs

Sprints

Whiteboard

Background

GCC has generic vector permute support. This can be used by the programmer in 2 ways. Using the language extension that is __builtin_shuffle or expecting the auto-vectorizer to generate vector permutes.

If the optimizers can detect a vector permute that can be done with a "constant" vector i.e. something like the Neon vrev{16/32/64} operations which can be expressed as per the testcase in gcc.target/arm/neon-vrev.c. The currently supported constant permutes are vrev16, vrev32, vrev64, vzip, vunzip, vtrn. In case none of these can be supported we fall back to using the generic vector permute instruction which is vtbl and vtbx. The implementation in GCC will do the same for both the language extension and the vectorized case as long as the backend specifies that this is a supported constant vector permute operation.

However the vtbl and vtbx instructions are expensive as they only operate on vector registers and need the mask for the generic shuffle to be loaded into it. For instance look at the example posted here to show the difference in code generated for the vrev cases http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01793.html .

The aim of this task is to do the same for the permute operations that are allowed with the Neon vext instruction.

Thus at a broad level the tasks should be the following.

* Understand the Neon vext instruction and generate testcases using the generic __builtin_shuffle mechanism.
* Understand the backend implementation . Look at how the functions arm_evpc_neon_vuzp , arm_evpc_neon_vrev etc are used in arm.c

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.