Home Update Intel Architecture Manual Updates: bfloat16 for Cooper Lake

Intel Architecture Manual Updates: bfloat16 for Cooper Lake

273


Intel lately launched a brand new model of its doc for software program builders revealing some further particulars about its upcoming Xeon Scalable ‘Cooper Lake-SP’ processors. As it seems, the brand new CPUs will help AVX512_BF16 directions and subsequently the bfloat16 format. Meanwhile, the principle intrigue right here is the truth that at this level AVX512_BF16 appears to be solely supported by the Cooper Lake-SP microarchitecture, however not its direct successor, the Ice Lake-SP microarchitecture.

The bfloat16 is a truncated 16-bit model of the 32-bit IEEE 754 single-precision floating-point format that preserves eight exponent bits, however reduces precision of the significand from 24-bits to eight bits to avoid wasting up reminiscence, bandwidth, and processing sources, whereas nonetheless retaining the identical vary. The bfloat16 format was designed primarily for machine studying and near-sensor computing functions, the place precision is required close to to zero however not a lot on the most vary. The quantity illustration is supported by Intel’s upcoming FPGAs in addition to Nervana neural community processors, and Google’s TPUs. Given the truth that Intel helps the bfloat16 format throughout two of its product traces, it is sensible to help it elsewhere as effectively, which is what the corporate goes to do by including its AVX512_BF16 directions help to its upcoming Xeon Scalable ‘Cooper Lake-SP’ platform.

AVX-512 Support Propogation by Various Intel CPUs
Newer uArch helps older uArch
  Xeon General Xeon Phi  
Skylake-SP AVX512BW
AVX512DQ
AVX512VL
AVX512F
AVX512CD
AVX512ER
AVX512PF
Knights Landing
Cannon Lake AVX512VBMI
AVX512IFMA
AVX512_4FMAPS
AVX512_4VNNIW
Knights Mill
Cascade Lake-SP AVX512_VNNI
Cooper Lake AVX512_BF16
Ice Lake AVX512_VNNI
AVX512_VBMI2
AVX512_BITALG
AVX512+VAES
AVX512+GFNI
AVX512+VPCLMULQDQ
(not BF16)
AVX512_VPOPCNTDQ
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 16)

The record of Intel’s AVX512_BF16 Vector Neural Network Instructions contains VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. All of them might be executed in 128-bit, 256-bit, or 512-bit mode, so software program builders can decide up one in all a complete of 9 variations based mostly on their necessities.

Intel AVX512_BF16 Instructions
Intel C/C++ Compiler Intrinsic Equivalent
Instruction Description
VCVTNE2PS2BF16 Convert Two Packed Single Data to One Packed BF16 Data

Intel C/C++ Compiler Intrinsic Equivalent:
VCVTNE2PS2BF16 __m128bh _mm_cvtne2ps_pbh (__m128, __m128);
VCVTNE2PS2BF16 __m128bh _mm_mask_cvtne2ps_pbh (__m128bh, __mmask8, __m128, __m128);
VCVTNE2PS2BF16 __m128bh _mm_maskz_cvtne2ps_pbh (__mmask8, __m128, __m128);
VCVTNE2PS2BF16 __m256bh _mm256_cvtne2ps_pbh (__m256, __m256);
VCVTNE2PS2BF16 __m256bh _mm256_mask_cvtne2ps_pbh (__m256bh, __mmask16, __m256, __m256);
VCVTNE2PS2BF16 __m256bh _mm256_maskz_cvtne2ps_ pbh (__mmask16, __m256, __m256);
VCVTNE2PS2BF16 __m512bh _mm512_cvtne2ps_pbh (__m512, __m512);
VCVTNE2PS2BF16 __m512bh _mm512_mask_cvtne2ps_pbh (__m512bh, __mmask32, __m512, __m512);
VCVTNE2PS2BF16 __m512bh _mm512_maskz_cvtne2ps_pbh (__mmask32, __m512, __m512);

VCVTNEPS2BF16 Convert Packed Single Data to Packed BF16 Data

Intel C/C++ Compiler Intrinsic Equivalent:
VCVTNEPS2BF16 __m128bh _mm_cvtneps_pbh (__m128);
VCVTNEPS2BF16 __m128bh _mm_mask_cvtneps_pbh (__m128bh, __mmask8, __m128);
VCVTNEPS2BF16 __m128bh _mm_maskz_cvtneps_pbh (__mmask8, __m128);
VCVTNEPS2BF16 __m128bh _mm256_cvtneps_pbh (__m256);
VCVTNEPS2BF16 __m128bh _mm256_mask_cvtneps_pbh (__m128bh, __mmask8, __m256);
VCVTNEPS2BF16 __m128bh _mm256_maskz_cvtneps_pbh (__mmask8, __m256);
VCVTNEPS2BF16 __m256bh _mm512_cvtneps_pbh (__m512);
VCVTNEPS2BF16 __m256bh _mm512_mask_cvtneps_pbh (__m256bh, __mmask16, __m512);
VCVTNEPS2BF16…



Source hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here