VZIP.<size> Qd, Qm
After VZIP.16 Qd, Qn, Qm :
On ARMv7, VZIP overwrites the source registers. The Q ‑register form interleaves pairs of 64‑bit halves (i.e., low half of Q0 with low half of Q1, high half of Q0 with high half of Q1). For full 128‑bit interleaving, use two steps or AArch64 ZIP1 / ZIP2 . armv7 neon zip
@ Assume Q0-Q7 hold the 8 rows of the matrix. @ We want to turn rows into columns. Qm After VZIP.16 Qd
@ Step 2: 16-bit Zip (Merging 16-bit chunks) @ Pair up rows to swap 16-bit blocks VZIP.16 Q0, Q1 @ Swap 16-bit chunks between row 0 and 1 VZIP.16 Q2, Q3 VZIP.16 Q4, Q5 VZIP.16 Q6, Q7 Qm : On ARMv7