3.1 Integer Instructions4. References
3.1.1 Integer Arithmetic Instructions
3.1.2 Integer Multiplication
3.1.3 Integer Reductions
3.1.4 Integer Compare Instructions
3.1.5 Integer Logical Instructions
3.1.6 Integer Rotate/Shift Instructions
3.2 Floating Point Instructions
3.2.1 Floating Point Arithmetic
3.2.2 Floating Point Logic
3.2.3 Floating Point Division, Square Root, Log and Exponentials
3.2.4 FP Rounding and Conversion
3.2.5 Floating Point Compare
3.3 Load and Store Instructions
3.3.1 Integer Load Instructions
3.3.2 Integer Store Instructions
3.3.3 Data Movement Instructions
3.4 Formatting Instructions
3.4.1 Pack
3.4.2 Unpack
3.4.3 Merge
3.4.4 Splat
3.4.5 Permute and Suffle
3.5 System Instructions
3.5.1 User level Cache instructions
3.5.2 Move From/To Vector Status Registers
| Developer | Extension | Base ISA | Instructions | Register file | Processor |
| Intel |
MMX |
x86 |
57 |
8x64b (FP) |
Pentium (1997) |
| Intel |
SSE |
x86 |
70 |
8x128b (XMM) |
Pentium III (1999) |
| Intel |
SSE2 |
x86 |
116 |
8x128b (XMM) |
Pentium IV (2000) |
| Intel |
SSE3 |
x86 | 13 |
8x128b (XMM) |
Pentium IVPrescott (2004) |
| Motorola |
Altivec |
PowerPC |
162 |
32x128b (V) |
MCF74xx aka G4 (1999) IBM 970 aka G5 ( 2003) |
| Intel | Multimedia Instructions | IPF (IA-64) | 47 (?) |
128x64bit | Merced |
| Jesús Corbal UPC |
MOM |
Alpha |
119 |
16 x 16 x 64b (VR) 2 x 192b (ACC) |
not available (yet) |
| Operation | Altivec | MMX |
SSE2 |
MOM_64 | MOM_128 |
| Modulo Add/Sub | VADD 8,16,32 |
PADD 8,16,32 |
PADD 8,16,32,64 |
M_V_ADD, M_VS_ADD U8,U16,S8,S16 | |
| Saturating Add/Subb |
VADD U8,U16,U32,S8,S16,S32 |
PADD U8,U16,S8,S16 |
PADD U8,U16,S8,S16 |
M_V_ADD, M_VS_ADD U8,U16,S8,S16 | |
| Average |
VAVG U8,U16,U32,S8,S16,S32 |
PAVG U8,U16 |
PAVG U8,U16 |
M_V_AVG U8,U16,S8,S16 | |
| Min/Max |
VMAX U8,U16,U32,S8,S16,S32 |
PMAX U8, S16 |
PMAX U8, S16 |
- | |
| Multiplication |
VMULE, VMULO U8,U16,S8,S16 |
PMULH,PMULL U16,S16 |
PMULH,PMULL U16,S16 |
M_V_MUL, M_VS_MUL S8,S16,S32 | |
| Multiply and accumulate |
VMLADD,VMHADD S16 |
PMADD S16 |
PMADD S16 |
M_V_MULA, M_VS_MULA S8,S16 |
|
| Multiply and sum |
VMSUM U8,U16,S8,S16 |
- |
- |
M_V_MAD S8,S16 |
|
| Sum Across |
VSUM S8,S16,S32 |
- |
- |
M_V_HADDA S8,S16 |
|
| Sum of Absolute differences | PSAD S8 | PSAD S8 |
M_V_AADDA S8,S16 |
| Operation |
Altivec |
MMX |
SSE |
SSE2 |
MOM_64 |
| add/sub (modulo arithmetic) |
VADD |
PADD |
PADD |
M_V_ADD |
|
| (Vu8 + Vu8 ) → Vu8 | VADDUBM | PADDB |
PADDB | M_V_ADD_UW_B M_VS_ADD_UW_B |
|
| (Vu16 + Vu16 ) → Vu16 | VADDUHM | PADDW | PADDW | M_V_ADD_UW M_VS_ADD_UW |
|
| (Vu32 + Vu32) → Vu32 | VADDUWM | PADDD | PADDD | ||
| (Vu64 + Vu64) → Vu64 | PADDQ |
||||
| (Vs8 + Vs8) → Vs8 | M_V_ADD_SW_B M_VS_ADD_SW_B |
||||
| (Vs16 + Vs16) → Vs16 | M_V_ADD_SW_W M_VS_ADD_SW_W |
||||
| add/sub (saturating arithmetic) | VADD | PADD |
PADD |
M_V_ADD |
|
| (Vu8 + Vu8) ⇒ Vu8 | VADDUBS | PADDUSB |
PADDUSB | M_V_ADD_US_B M_VS_ADD_US_B |
|
| (Vu16 + Vu16) ⇒ Vu16 | VADDUHS | PADDUSW | PADDUSW | M_V_ADD_US_W M_VS_ADD_US_W |
|
| (Vu32 + Vu32) ⇒ Vu32 | VADDUWS | ||||
| (Vs8 + Vs8) ⇒ Vs8 | VADDSBS | PADDSB | PADDSB | M_V_ADD_SS_B M_VS_ADD_SS_B |
|
| (Vs16 + Vs16) ⇒ Vs16 | VADDSHS | PADDSW | PADDSW | M_V_ADD_SS_W M_VS_ADD_SS_W |
|
| (Vs32 + Vs32) ⇒ Vs32 | VADDSWS | ||||
| Add and carry aout |
|||||
| C(Vu32 + Vu32) → Vu32 |
VADDCUW |
||||
| average | VAVG |
PAVG | PAVG |
M_AVG |
|
| (Vu8 avg Vu8) → Vu8 | VAVGUB |
PAVGB | PAVGB | M_V_AVG_U_B |
|
| (Vu16 avg Vu16) → Vu16 | VAVGUH | PAVGW | PAVGW | M_V_AVG_U_W | |
| (Vu32 avg Vu32) → Vu32 | VAVGUW | ||||
| (Vs8 avg Vs8) → Vs8 | VAVGSB | M_V_AVG_S_B | |||
| (Vs16 avg Vs16) → Vs16 | VAVGSH | M_V_AVG_S_W | |||
| (Vs32 avg Vs32) → Vs32 |
VAVGSW |
||||
| max |
VMAX |
PMAX |
PMAX |
||
| (Vu8 max Vu8) → Vu8 |
VMAXUB |
|
PMAXUB |
PMAXUB | |
| (Vs8 max Vs8) → Vs8 |
VMAXSB |
||||
| (Vu16 max Vu16) → Vu16 | VMAXUH |
||||
| (Vs16 max Vs16) → Vs16 | VMAXSH |
|
PMAXSW | PMAXSW | |
| (Vu32 max Vu32) → Vu32 | VMAXUW |
||||
| (Vs32 max Vs32) → Vs32 | VMAXSW |
||||
| min | VMIN | PMIN | PMIN | ||
| (Vu8 min Vu8) → Vu8 | VMINUB | PMINUB | PMINUB | ||
| (Vs8 min Vs8) → Vs8 | VMINSB | ||||
| (Vu16 min Vu16) → Vu16 | VMINUH | ||||
| (Vs16 min Vs16) → Vs16 | VMINSH | PMINSW | PMINSW | ||
| (Vu32 min Vu32) → Vu32 | VMINUW | ||||
| (Vs32 min Vs32) → Vs32 | VMINSW |
| Operation |
Altivec |
MMX |
SSE |
SSE2 |
MOM |
| Multiply |
|||||
| Vs8 x Vs8 → Vs8 | M_V_MUL_SS_B M_VS_MUL_SS_B |
||||
| Vs16 x Vs16 → Vs16 | M_V_MUL_SS_W M_VS_MUL_SS_W |
||||
| Vs32 x Vs32 → Vs32 | M_V_MUL_SS_D M_VS_MUL_SS_D |
||||
| Vu32 x Vu32 → Vu64 | PMULUDQ | ||||
| Truncation Multiply | |||||
| U16(Vu16 x Vu16) → Vu16 |
|
PMULHUW | |||
| U16(Vs16 xVs16) → Vs16 | PMULHW | PMULHW | |||
| L16(Vs16 x Vs16) → Vs16 | PMULLW |
PMULLW | |||
| Even-Odd Multiply |
|||||
| E(Vu8) x E(Vu8) → Vu16 | VMULEUB | ||||
| E(Vu16) x E(Vu16) → Vu32 | VMULEUW | ||||
| E(Vs8) x E(Vs8) → Vs16 | VMULESB | ||||
| E(Vs16) x E(Vs16) → Vs32 | VMULESW |
||||
| O(Vu8) x O(Vu8) → Vu16 | VMULOUB |
||||
| O(Vu16) x O(Vu16) → Vu32 | VMULOUW | ||||
| O(Vs8) x O(Vs8) → Vs16 | VMULOSB | ||||
| O(Vs16) x O(Vs16) → Vs32 | VMULOSW |
| Operation |
Altivec |
MMX |
SSE | SSE2 |
MOM |
| Multiply and Accumulate | |||||
| L16(Vs16 x Vs16 + Vu16) → Vs16 | VMLADDUHM | ||||
| U16(Vs16 x Vs16 + Vs16) ⇒ Vs16 | VMHADDSHS | ||||
| R+(Fs16 x Fs16) → Fs32 | PMADDWD | PMADDWD | |||
| R+(Vs8 x Vs8) ⇒ ACCs24 | M_V_MULA_B M_VS_MULA_B |
||||
| R+(Vs16 x Vs16) ⇒ ACCs48 | M_V_MULA_W M_VS_MULA_W |
||||
| Vector Multiply - Sum | |||||
| R+(Vu8 x Vu8) + Vu32 → Vu32 | VMSUMUBM | ||||
| R+(Vu16 x Vu16) + Vu32 → Vu32 | VMSUMUHM | ||||
| R+(Vu8 x Vs8) + Vs32 → Vs32 | VMSUMMBM | ||||
| R+(Vs16 x Vs16) + Vs32 → Vs32 | VMSUMSHM | ||||
| R+(Vu16 x Vu16) + Vu32 ⇒ Vu32 | VMSUMUHS | ||||
| R+(Vu16 x Vs16) + Vs32 ⇒ Vs32 | VMSUMSHS | ||||
| R+(Vs8 x Vs8 +Vs8 x Vs8 ) → Vs16 | M_V_MADD_S_B | ||||
| R+(Vs16 x Vs16+ Vs16 x Vs16) → Vs32 | M_V_MADD_S_W | ||||
| Vector Sum Across | |||||
| R+(Vs8) ⇒ Vs64 |
M_HADDA_S_B |
||||
| R+(Vs16) ⇒ Vs64 |
M_HADDA_S_W |
||||
| R+(Vs32) + Vs32 ⇒ Vs32 | VSUMSWS | ||||
| R+(Vs32) + E(Vs32) ⇒ E(Vs32) | VSUM2SWS | ||||
| R+(Vs8) + Vs32 ⇒ Vs32 | VSUM4SBS | ||||
| R+(Vs16) + Vs32 ⇒ Vs32 | VSUM4SHS | ||||
| Vector Sum of Absolute Differences | |||||
| R+(Abs(Ms8)) ⇒ Ms64 | PSADBW | PSADBW | M_AADDA_S_B |
||
| R+(Abs(Ms16)) ⇒ Ms64 | M_AADDA_S_W |
| Operation |
Altivec |
MMX | SSE |
SSE2 |
MOM |
| Compare Greater than unsigned |
|||||
| m(Vu8>Vu8) → V8 |
VCMPGTUB |
M_CMPGT.m.u.b |
|||
| m(Vu16>Vu16) → V16 | VCMPGTUH | M_CMPGT.m.u.w | |||
| m(Vu32>Vu32) → V32 | VCMPGTUW | ||||
| Compare Greater than signed |
|||||
| m(Vs8>Vs8) → V8 | VCMPGTSB | PCMPGTB |
PCMPGTB | M_CMPGT.m.s.b | |
| m(Vs16>Vs16) → V16 | VCMPGTSH | PCMPGTW |
PCMPGTW | M_CMPGT.m.s.w | |
| m(Vs32>Vs32) → V32 | VCMPGTSW | PCMPGTD |
PCMPGTD | ||
| Compare Equal to |
|||||
| m(Vu8==Vu8) → V8 | VCMPEQUB |
PCMPEQB |
PCMPEQB | M_CMPEQ.m.u.b |
|
| m(Vu16==Vu16) → V16 | VCMPEQUH | PCMPEQW |
PCMPEQW | M_CMPEQ.m.u.w | |
| m(Vu32==Vu32) → V32 | VCMPEQUW | PCMPEQD |
PCMPEQD |
| Operation | Altivec | MMX | SSE |
SSE2 |
MOM |
| Logical AND |
|||||
| V & V → V |
VAND |
PAND |
PAND | M_AND.m.u.q |
|
| Logical OR | |||||
| V | V → V |
VOR |
POR |
POR | M_OR.m.u.q |
|
| Logical XOR |
|||||
| V ⊕ V |
VXOR |
PXOR |
PXOR | M_XOR.m.u.q |
|
| Logical AND with complement |
|||||
| !V & V → V |
VANDC |
PANDN |
PANDN | M_NAND.m.u.q |
|
| Logical NOR |
|||||
| !(V | V) → V |
VNOR |
| Operation | Altivec | MMX | SSE |
SSE2 |
MOM |
| Rotate Left |
|||||
| VRLB |
|||||
| VRLH |
|||||
| VRLW |
|||||
| Shift Left |
|||||
| V8 << V8 → V8 |
VSLB |
M_SLL.ms.u.b |
|||
| V16 << V16 → V16 | VSLH |
PSLLW |
PSLLW | M_SLL.ms.u.w | |
| V32 << V32 → V32 | VSLW |
PSLLD |
PSLLD | ||
| .V64 << .V64 → .V64 | PSLLQ |
PSLLQ | M_SLL.ms.u.q | ||
| V64 << imm8 → V64 | PSLLDQ |
||||
| Shift Right |
|||||
| V8 >> V8 → V8 | VSRB | ||||
| V16 >> V16 → V16 | VSRH |
PSRLW | PSRLW | ||
| V32 >> V32 → V32 | VSRW |
PSRLD |
PSRLD | ||
| .V64 << .V64 → .V64 | PSRLQ |
PSRLQ | |||
| V64 >> imm8 → V64 | PSRLDQ |
||||
| Shift Right Arithmetic |
|||||
| V8 _>> V8 → V8 | VSRAB |
M_SRA.ms.u.b |
|||
| V16 _>> V16 → V16 | VSRAH | PSRAW | PSRAW | M_SRA.ms.u.w | |
| V32 _>> V32 → V32 | VSRAW | PSRAD |
PSRAD | M_SRA.ms.u.d | |
| M_SRA.ms.u.q |
| Operation | Altivec |
SSE |
SSE2 |
MOM |
| Vector Add |
||||
| Vfp32 + Vfp32 → Vfp32 |
VADDFP |
ADDPS ADDSS (scalar) |
||
| Vfp64 + Vfp 64 → Vfp64 |
ADDPD ADDSD (scalar) |
|||
| Vector Sub |
||||
| Vfp32 - Vfp32 → Vfp32 | VSUBFP | SUBPS SUBSS (scalar) |
||
| Vfp64 - Vfp64 → Vfp64 | SUBPD SUBSD (scalar) |
|||
| Vector Multiply |
||||
| Vfp32 x Vfp32 → Vfp32 |
MULPS MULSS (scalar) |
|||
| Vfp64 x Vfp64 → Vfp64 | MULPD MULSD (scalar) |
|||
| Multiply and Add |
||||
| (Vfp32 x Vfp32 ) + Vfp32 → Vfp32 |
VMADDFP |
|||
| Multiply and Sub |
||||
| (Vfp32 x Vfp32 ) - Vfp32 → Vfp32 | VNMSUBFP |
|||
| Horizontal Add |
||||
| R+(Vfp32) → Vf32 |
HADDPS (SSE3) |
|||
| R+(Vfp64) → Vf64 | HADDPD (SSE3) |
|||
| Horizontal Sub |
||||
| R-(Vfp32) → Vf32 | HSUBPS (SSE3) |
|||
| R-(Vfp64) → Vf64 | HSUBPD (SSE3) | |||
| AddSub |
||||
| HSUBPS (SSE3) |
||||
| Vfp[127..64] + Vfp[127..64]/ Vfp[63..0] - Vfp[63..0] → Vf64 |
HSUBPD(SSE3) |
|||
| Maximum |
||||
| (Vfp32 max Vfp32) -> Vfp32 |
VMAXFP |
MAXPS MAXSS(scalar) |
||
| (Vfp64 max Vfp64) -> Vfp64 | MAXPD MAXSD (scalar) |
|||
| Minimum | ||||
| (Vfp32 min Vfp32) -> Vfp32 | VMINFFP |
MINPS MINSS (scalar) |
||
| (Vfp64 min Vfp64) -> Vfp64 | MINPD MINSD (scalar) |
| Operation | Altivec |
SSE |
SSE2 |
MOM |
| Vector AND |
||||
| (Vfp32 & Vfp32) → Vfp32 |
ANDPS |
|||
| (Vfp64 & Vfp64 ) → Vfp64 | ANDPD | |||
| Vector AND NOT |
||||
| (Vfp32 ~& Vfp32) → Vfp32 | ANDNPS | |||
| (Vfp64 ~& Vfp64) → Vfp64 | ANDNPD | |||
| Vector OR |
||||
| (Vfp32 | Vfp32) → Vfp32 |
ORPS |
|||
| (Vfp64 | Vfp64) → Vfp64 | ORPD | |||
| Vector XOR |
||||
| (Vfp32 ^ Vfp32) → Vfp32 | XORPS |
|||
| (Vfp64 ^ Vfp64) → Vfp64 | XORPD |
| Operation | Altivec | SSE | SSE2 |
MOM |
| Vector Divide |
||||
| Vfp32 / Vfp32 → Vfp32 |
DIVPS DIVSS (scalar) |
|||
| Vfp64 / Vfp64 → Vfp64 | DIVPD DIVSD (scalar) |
|||
| Vector Reciprocal Estimate |
||||
| 1/Vfp32 → Vfp32 |
VREFP |
RCPPS RCPSS (Scalar) |
||
| Vector Square Root | ||||
| Sqrt(Vfp32) → Vfp32 | SQRTPS SQRTSS (scalar) |
|||
| Sqrt(Vfp64) → Vfp64 | SQRTPD SQRTSD (scalar) |
|||
| Vector Reciprocal Square Root Estimate | ||||
| 1/Sqrt(Vfp32) → Vfp32 |
VRSQRTEFP |
RSQRTPS RSQRTSS (scalar) |
||
| Vector Log2 Estimate | ||||
| Log2 (Vfp32) → Vfp 32 |
VLOGEFP |
|||
| Vector 2 Raised to Exponent Estimate |
||||
| Exp2(Vfp32) → Vfp32 |
VEXPTEFP |
| Operation | Altivec |
SSE |
SSE2 |
MOM |
| Round to FP Integer Nearest |
||||
| RoundN(Vfp32) → Vfp32 |
VRFIN |
|||
| Round to FP Integer toward Zero | ||||
| RoundZ(Vfp32) → Vfp32 | VRFIZ |
|||
| Round to FP Integer toward Positive Infinity |
||||
| Round+I(Vfp32) → Vfp32 | VRFIP | |||
| Round to FP Integer toward Minus Infinity | ||||
| Round-I(Vfp32) → Vfp32 | VRFIM |
|||
| Vector Convert to FP from Unsigned Fixed Point |
||||
| Vu32 → Vfp32 |
VCFUX |
|||
| Vector Convert to FP from Signed Fixed Point | ||||
| Vs32 → Vfp32 V.s32 → V.fp32 |
VCFSX | CVTPI2PS CVTSI2SS (scalar) |
||
| Vs64 → Vfp32 |
CVTDQ2PS |
|||
| Vs64 → Vfp64 |
CVTDQ2PD |
|||
| Vs32 → Vfp64 |
CVTPI2PD |
|||
| R32 → .Vfp64 | CVTSI2SD | |||
| Vector Convert to Unsigned Fixed Point Word Saturate |
||||
| Vfp32 ⇒Vu32 |
VCTUXS |
|||
| Vector Convert to Signed Fixed Point Word Saturate | ||||
| Vfp32 ⇒Vs32 | VCTSXS | CVTPS2PI CVTSS2SI (scalar) |
||
| Vfp32 ⇒Vs64 | CVTPS2DQ | |||
| Vfp64 → Vs32 .Vfp64 → R32 |
CVTPD2PI CVTSD2SI |
|||
| Vector Convert to Signed Fixed Point Word Truncate | ||||
| L64(Vfp32)→Vs32 .Vfp32→Rs32 |
CVTTPS2PI CVTTSS2SI (scalar) |
CVTTPS2DQ CVTTSD2SI |
||
| Vfp64→ 0 || Vs32 | CVTTPD2DQ |
|||
| Vfp64→ Fs32 | CVTTPD2PI |
|||
| Vector Convert from FP to FP |
||||
| Vfp64 → 0 || Vfp32 .Vfp64 → .Vfp32 |
CVTPD2PS CVTSD2SS |
|||
| Vfp64 → 0 || Vs64 | CVTPD2DQ | |||
| Vfp32 → Vfp64 .Vfp32 → .Vfp64 |
CVTPS2PD CVTSS2SD |
| Operation | Altivec | SSE | SSE2 |
MOM |
| Vector Compare Greater than FP |
||||
| m(Vfp32 > Vfp32) → V32 |
VCMPGTFP |
|||
| Vector Compare Equal To FP |
||||
| m(Vfp32 == Vfp32) → V32 | VCMPEQFP |
CMPEQPS * | ||
| m(Vfp64 == Vfp64) → V64 | CMPEQPD * |
|||
| Vector Compare Greater than or Equal To FP |
||||
| m(Vfp32 >_ Vfp32) → V32 | VCMPGEFP | |||
| Vector Compare Bounds FP |
||||
| m(Vfp32 <> Vfp32) → V32 | VCMPBFP |
|||
| Vector Compare Less than FP |
||||
| m(Vfp32 < Vfp32) → V32 | CMPLTPS |
|||
| m(Vfp64 < Vfp64) → V64 | CMPLTPD | |||
| Vector Compare Less than or Equal To FP | ||||
| m(Vfp32 <= Vfp32) → V32 | CMPLEPS |
|||
| m(Vfp64 <= Vfp64) → V64 | CMPLEPD | |||
| Vector Compare Unordered FP | ||||
| m(Vfp32 ? Vfp32) → V32 | CMPUNORDPS |
|||
| m(Vfp64 ? Vfp64) → V64 | CMPUNORDPD | |||
| Vector Compare Not Equal |
||||
| m(Vfp32 != Vfp32) → V32 | CMPNQPS |
|||
| m(Vfp64 != Vfp64) → V64 | CMPNQPD | |||
| Vector Compare Not Less than FP | ||||
| m(!(Vfp32 < Vfp32)) → V32 | CMPNLTPS |
|||
| m(!(Vfp64 < Vfp64)) → V64 | CMPNLTPD | |||
| Vector Compare Not Less than or Equal FP | ||||
| m(!(Vfp32 <= Vfp64)) → V64 | CMPNLEPS |
|||
| m(!(Vfp64 <= Vfp64)) → V64 | CMPNLEPD | |||
| Vector Compare Ordered FP | ||||
| m(!(Vfp32 ? Vfp32)) → V32 | CMPORDPS |
|||
| m(!(Vfp64 ? Vfp64)) → V64 | CMPORDPD | |||
| Scalar Compare |
||||
| CMPSS |
||||
| CMPSD | ||||
| Scalar Ordered Compare and Set Flags |
||||
| COMISS |
||||
| COMISD | ||||
| Scalar Unordered Compare and Set Flags | ||||
| single precision |
UCOMISS |
|||
| UCOMISD |
| Operation | Altivec | MMX | SSE |
SSE2 |
MOM |
| Load Vector Element Indexed |
|||||
| mem8 → V8 |
LVEBX |
||||
| mem16 → V16 |
LVEHX |
||||
| mem32 → V32 |
LVEWX |
||||
| Load Vector Indexed | |||||
| mem128 → V (forced aligment) |
LVX |
M_LD.m.u.q |
|||
| Load Vector Indexed LRU | |||||
| mem128 → V (forced aligment), transient | LVXL |
||||
| Load Vector for Shift Left |
|||||
| f(R+R) → V |
LVSL |
||||
| Load Vector for Shift Right |
|||||
| f(R+R) → V | LVSR |
| Operation | Altivec | MMX | MOM |
| Store Vector Element Integer Indexed |
|||
| V8 →mem8 |
STVEBX |
||
| V16 → mem16 |
STVEHX |
||
| V32 → mem32 |
STVEWX |
||
| Store Vector Indexed |
|||
| V → mem128 (forced aligment) |
STVX |
M_ST.m.u.q |
|
| Store Vector Indexed LRU |
|||
| V → mem128 (forced aligment), transient | STVXL |
| Operation | Altivec | MMX |
SSE | SSE2 |
SSE3 | MOM |
| Move |
||||||
| MM[31..0] → R[31..0] MM[31..0] → mem R[31..0]→ MM[31..0] mem → MM[31..0] |
MOVD | |||||
| MM[63..0] → MM[63..0] MM[63..0] → mem mem → MM[63..0] |
MOVQ | |||||
| XMM[127..0] → XMM[127..0] XMM[127..0] → mem128 mem128 → XMM[127..0] |
MOVDQA (aligned) MOVDQU (unaligned) |
LDDQU (unaligned load, supports splits between cache lines) |
||||
| MMX[63..0] → XMM[63..0] | MOVQ2DQ |
|||||
| XMM[63..0] → MMX[63..0] | MOVDQ2Q | |||||
| Move Aligned FP |
||||||
| XMM[127..0] → XMM[127..0] XMM[127..0]→ mem ( exception if unaligned) mem → XMM[127..0] ( exception if unaligned) |
MOVAPS (single precision) |
MOVAPD (double precision) |
||||
| Move Unaligned FP |
||||||
| XMM[127..0] → XMM[127..0] XMM[127..0] → mem128 mem128 → XMM[127..0] |
MOVUPS (single precision) |
MOVUPD (double precision) |
||||
| Move Aligned High FP | ||||||
| XMM[127..64] → mem64 mem128 → XMM[127..64] |
MOVHPS (single precision) |
MOVHPD (double precision) |
||||
| Move High to Low FP | ||||||
| XMM1[127-64] → XMM1[127-64] XMM2[127-64] → XMM1[63-0] |
MOVHLPS (single precision) |
|||||
| Move Aligned Low FP | ||||||
| XMM[63..0] → mem64 mem64 → XMM[63..0] |
MOVLPS (single precision) |
MOVLPD (double precision) |
||||
| Move Low to High FP | ||||||
| XMM2[63-0] → XMM1[127-64] XMM1[63-0] → XMM1[63-0] |
MOVLHPS (single precision) |
|||||
| Move and duplicate |
||||||
| XMM[63..0] →XMM[127..64], XMM[63.00] mem64[63..0] →XMM[127..64], XMM[63.00] |
MOVDDUP (SSE3) |
|||||
| XMM[127..96] →XMM[127..96], XMM[95..64] XMM[63..32] →XMM[63..32], XMM[32..0] |
MOVSHDUP (SSE3) |
|||||
| XMM[95..64] →XMM[127..96], XMM[95..64] XMM[31..0] →XMM[63..32], XMM[32..0] |
MOVSLDUP (SSE3) |
|||||
| Move Sign Mask To Integer FP |
||||||
| XMM[i*32-1] → R[i] |
MOVMSKPS (single precision) |
MOVMSKPD (double precision) |
||||
| Move Scalar FP | ||||||
| V.fp32/64 → V.fp32/64 V.fp32/64 → mem ( exception if unaligned) mem → V.fp32/64 ( exception if unaligned) |
MOVSS (single precision) |
MOVSD (double precision) |
||||
| Extract Word | ||||||
| MMX_select_by_imm[15-0]→r[15-0] 0x0000 → r[31-16] |
PEXTRW | |||||
| Insert Word | ||||||
| r32[15-0]→ MMX_select_by_imm[15-0] | PINSRW |
| Operation | Altivec | MMX | SSE |
SSE2 |
MOM |
| Pack Unsigned Integer Unsigned Modulo |
|||||
| (Vu16 || Vu16) → Vu8 |
VPKUHUM |
M_PCK.m.uw.b |
|||
| (Vu32 || Vu32) → Vu16 | VPKUWUM |
M_PCK.m.uw.w | |||
| Pack Unsigned Integer Unsigned Saturate | |||||
| (Vu16 || Vu16) ⇒ Vu8 | VPKUHUS |
||||
| (Vu32 || Vu32) ⇒ Vu16 | VPKUWUS |
||||
| Pack Signed Integer Unsigned Saturate | |||||
| (Vs16 || Vs16 ) ⇒ Vu8 |
VPKSHUS |
PACKUSWB | PACKUSWB | M_PCK.m.us.b | |
| (Vs32 || Vs32 ) ⇒ Vu16 | VPKSWUS |
M_PCK.m.us.w | |||
| Pack Signed Integer Signed Saturate | |||||
| (Vs16 || Vs16 ) ⇒ Vs8 | VPKSHSS |
PACKSSWB |
PACKSSWB | M_PCK.m.ss.b |
|
| (Vs32 || Vs32 ) ⇒ Vs16 | VPKSWSS |
PACKSSDW |
PACKSSDW | M_PCK.m.ss.w | |
| Pack Pixel |
|||||
| (V || V) → Vpixel |
VPKPX |
| Operation | Altivec | MMX |
SSE | SSE2 |
MOM |
| Unpack High Signed Integer |
|||||
| U(Vs8) → Vs16 |
VUPKHSB |
||||
| U(Vs16) → Vs32 |
VUPKHSH |
||||
| Unpack Low Signed Integer | |||||
| L(Vs8) → Vs16 | VUPKLSB |
||||
| L(Vs16) → Vs32 | VUPKLSB | ||||
| Unpack High Pixel | |||||
| U(Vpixel) → V32 |
VUPKHPX |
||||
| Unpack Low Pixel | |||||
| L(Vpixel) → V32 |
VUPKLPX |
| Operation | Altivec | MMX | SSE |
SSE2 |
MOM |
| Vector Merge High |
|||||
| U(V8) ∧∨ U(V8) → V |
VMRGHB |
PUNPCKHBW | PUNPCKHBW | M_UPCK.m.h.b | |
| U(V16) ∧∨ U(V16) → V | VMRGHH |
PUNPCKHWD | PUNPCKHWD | M_UPCK.m.h.w | |
| U(V32) ∧∨ U(V32) → V | VMRGHW |
PUNPCKHDQ | PUNPCKHDQ | ||
| U(V64) ∧∨ U(V64) → V | PUNPCKHQDQ | ||||
| U(Vfp32) ∧∨ U(Vfp32) → V | UNPCKHPS | ||||
| U(Vfp64) ∧∨ U(Vfp64) → V | UNPCKHPD | ||||
| Vector Merge Low Integer | |||||
| L(V8) ∧∨ L(V8) → V | VMRGLB |
PUNPCKLBW | PUNPCKLBW | M_UPCK.m.l.b | |
| L(V16) ∧∨ L(V16) → V | VMRGLH |
PUNPCKLWD | PUNPCKLWD | M_UPCK.m.l.b | |
| L(V32) ∧∨ L(V32) → V | VMRGLW |
PUNPCKLDQ | PUNPCKLDQ | ||
| L(V64) ∧∨ L(V64) → V | PUNPCKLQDQ | ||||
| L(Vfp32) ∧∨ L(Vfp32) → V | UNPCKLPS | ||||
| L(Vfp64) ∧∨ L(Vfp64) → V | UNPCKLPD |
| Operation | Altivec | MMX/SSE | MOM |
| Vector Splat Integer |
|||
| VSPLTB |
|||
| VSPLTH |
|||
| VSPLTW |
|||
| Vector Splat Immediate Signed Integer | |||
| VSPLTISB |
|||
| VSPLTISH |
|||
| VSPLTISW |
| Operation | Altivec | MMX/SSE | MOM |
| Vector Shift Left |
|||
| VSL |
|||
| Vector Shift Right |
|||
| VSR | |||
| Vector Shift Left Double by Octect Immediate |
|||
| VSLDOI |
|||
| Vector Shift Left by Octect |
|||
| VSLO |
|||
| Vector Shift Rigth by Octect | |||
| VSRO |
| Operation | Altivec | SSE | SSE2 |
MOM |
| Vector Permute |
||||
| (V8 || V8) [Vu8] →V8[i] | VPERM | |||
| Vector Shuffle |
||||
| PSHUFW |
||||
| PSHUFD |
||||
| PSHUFHW | ||||
| PSHUFLW | ||||
| SHUFPS (single precision) |
||||
| Move Byte Mask | ||||
| byte_mask → r32 | PMOVMSKB | |||
| Matrix Transpose |
||||
| M_TRANS.m.u.b |
||||
| M_TRANS.m.u.w |
| Operation | Altivec | MMX | SSE | SSE2 |
MOM |
| Data Stream Touch |
DST | ||||
| Data Stream Touch Transient |
DSTT | ||||
| Data Stream Touch for Store |
DSTST | ||||
| Data Stream Touch for Store Transient |
DSTST | ||||
| Data Stream Stop |
DSS | ||||
| DSSALL | |||||
| Flush |
|||||
| Flush and invalidate memory operand in cache |
CFLUSH |
||||
| Prefetch |
|||||
| Frefetch data into caches |
PREFECTH | ||||
| Fence |
|||||
| serialize stores |
SFENCE | ||||
| serialize loads |
LFENCE |
||||
| serialize load and stores |
MFENCE |
||||
| Non Temporal byte Mask Store of Packed Integer |
|||||
| if(mask) V8[i] → mem8[i] (64 bits) | MASKMOVQ | ||||
| if(mask) V8[i] → mem8[i] (128 bits) | MASKMOVDQU | ||||
| Non temporal Store of Packed Integer |
|||||
| F → mem (no write allocate) | MOVNTQ | ||||
| V → mem (no write allocate) | MOVNTDQ |
||||
| R → mem (no write allocate) | MOVNTI | ||||
| Non temporal Store of Packed FP |
|||||
| F → mem (no write allocate) | MOVNTPS | ||||
| V → mem (no write allocate) | MOVNTPD |
| Operation | Altivec | MMX | SSE | SSE2 | MOM |
| Restore FP, MMX and SSE State | FXRSTOR | ||||
| Save FP, MMX and SSE State | FXSAVE | ||||
| Load SIMD Extension Control Status | LDMXCSR | ||||
| Store SSE Control Status | STMXCSR |
| Symbol |
Operation |
| R+ |
Additive reduction |
| R- |
Substractive reduction |
| Rx |
Multiplicative reduction |
| { |
Round to nearest (even) |
| E |
even values |
| m |
mask |
| . |
scalar value |
| <> |
Bounds |
| f |
Partial permute |
| U |
Upper part of bytes |
| L |
Lower Part of bytes |
| ∧∨ | Interleave |
| ⇒ | Saturate Arithmetic |
| → | Modulo (Wrap Around) Arithmetic |
| ⊕ | Exclusive Or |
| _>> | Right Arithmetic Shift |
| ∧∨ | Interleave |