Intel lately launched a brand new model of its doc for software program builders revealing some further particulars about its upcoming Xeon Scalable ‘Cooper Lake-SP’ processors. As it seems, the brand new CPUs will help AVX512_BF16 directions and subsequently the bfloat16 format. Meanwhile, the principle intrigue right here is the truth that at this level AVX512_BF16 appears to be solely supported by the Cooper Lake-SP microarchitecture, however not its direct successor, the Ice Lake-SP microarchitecture.
The bfloat16 is a truncated 16-bit model of the 32-bit IEEE 754 single-precision floating-point format that preserves eight exponent bits, however reduces precision of the significand from 24-bits to eight bits to avoid wasting up reminiscence, bandwidth, and processing sources, whereas nonetheless retaining the identical vary. The bfloat16 format was designed primarily for machine studying and near-sensor computing functions, the place precision is required close to to zero however not a lot on the most vary. The quantity illustration is supported by Intel’s upcoming FPGAs in addition to Nervana neural community processors, and Google’s TPUs. Given the truth that Intel helps the bfloat16 format throughout two of its product traces, it is sensible to help it elsewhere as effectively, which is what the corporate goes to do by including its AVX512_BF16 directions help to its upcoming Xeon Scalable ‘Cooper Lake-SP’ platform.
AVX-512 Support Propogation by Various Intel CPUs Newer uArch helps older uArch |
||||||
Xeon | General | Xeon Phi | ||||
Skylake-SP | AVX512BW AVX512DQ AVX512VL |
AVX512F AVX512CD |
AVX512ER AVX512PF |
Knights Landing | ||
Cannon Lake | AVX512VBMI AVX512IFMA |
AVX512_4FMAPS AVX512_4VNNIW |
Knights Mill | |||
Cascade Lake-SP | AVX512_VNNI | |||||
Cooper Lake | AVX512_BF16 | |||||
Ice Lake | AVX512_VNNI AVX512_VBMI2 AVX512_BITALG AVX512+VAES AVX512+GFNI AVX512+VPCLMULQDQ (not BF16) |
AVX512_VPOPCNTDQ | ||||
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 16) |
The record of Intel’s AVX512_BF16 Vector Neural Network Instructions contains VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. All of them might be executed in 128-bit, 256-bit, or 512-bit mode, so software program builders can decide up one in all a complete of 9 variations based mostly on their necessities.
Intel AVX512_BF16 Instructions Intel C/C++ Compiler Intrinsic Equivalent |
||
Instruction | Description | |
VCVTNE2PS2BF16 | Convert Two Packed Single Data to One Packed BF16 Data
Intel C/C++ Compiler Intrinsic Equivalent: |
|
VCVTNEPS2BF16 | Convert Packed Single Data to Packed BF16 Data
Intel C/C++ Compiler Intrinsic Equivalent: |