With the launch of their Ampere structure and new A100 accelerator barely a month behind them, NVIDIA this morning is saying the PCIe model of their accelerator as a part of the beginning of the now-virtual ISC Digital convention for top efficiency computing. The extra straight-laced counterpart to NVIDIA’s flagship SXM4 model of the A100 accelerator, the PCie model of the A100 is designed to supply A100 in a extra conventional type issue for patrons who want one thing that they’ll plug into standardized servers. Overall the PCIe A100 presents the identical peak efficiency because the SXM4 A100, nevertheless with a decrease 250 Watt TDP, real-world efficiency received’t be fairly as excessive.
The compulsory counterpart to NVIDIA’s SXM type issue accelerators, NVIDIA’s PCIe accelerators serve to flesh out the opposite aspect of NVIDIA’s accelerator lineup. While NVIDIA would gladly promote everybody SXM-based accelerators – which would come with the dear NVIDIA HGX service board – there are nonetheless quite a few prospects who want to have the ability to use GPU accelerators in commonplace, PCIe-based rackmount servers. Or for smaller workloads, prospects don’t want the sort of 4-way and better scalability supplied by SXM-form issue accelerators. So with their PCIe playing cards, NVIDIA can serve the remainder of the accelerator market that their SXM merchandise can’t attain.
The PCIe A100, in flip, is a full-fledged A100, simply in a special type issue and with a extra acceptable TDP. In phrases of peak efficiency, the PCIe A100 is simply as quick as its SXM4 counterpart; NVIDIA this time isn’t transport this as a cut-down configuration with decrease clockspeeds or fewer purposeful blocks than the flagship SXM4 model. As a consequence the PCIe card brings every thing A100 presents to the desk, with the identical heavy deal with tensor operations, together with the brand new greater precision TF32 and FP64 codecs, in addition to even sooner integer inference.
NVIDIA Accelerator Specification Comparison | ||||||
A100 (PCIe) |
A100 (SXM4) |
V100 (PCIe) |
P100 (PCIe) |
|||
FP32 CUDA Cores | 6912 | 6912 | 5120 | 3584 | ||
Boost Clock | 1.41GHz | 1.41GHz | 1.38GHz | 1.3GHz | ||
Memory Clock | 2.4Gbps HBM2 | 2.4Gbps HBM2 | 1.75Gbps HBM2 | 1.4Gbps HBM2 | ||
Memory Bus Width | 5120-bit | 5120-bit | 4096-bit | 4096-bit | ||
Memory Bandwidth | 1.6TB/sec | 1.6TB/sec | 900GB/sec | 720GB/sec | ||
VRAM | 40GB | 40GB | 16GB/32GB | 16GB | ||
Single Precision | 19.5 TFLOPs | 19.5 TFLOPs | 14.1 TFLOPs | 9.Three TFLOPs | ||
Double Precision | 9.7 TFLOPs (half of FP32 charge) |
9.7 TFLOPs (half of FP32 charge) |
7 TFLOPs (half of FP32 charge) |
4.7 TFLOPs (half of FP32 charge) |
||
INT8 Tensor | 624 TOPs | 624 TOPs | N/A | N/A | ||
FP16 Tensor | 312 TFLOPs | 312 TFLOPs | 112 TFLOPs | N/A | ||
TF32 Tensor | 156 TFLOPs | 156 TFLOPs | N/A | N/A | ||
Relative Performance (SXM Version) | 90% | 100% | N/A | N/A | ||
Interconnect | NVLink 3 12 Links (600GB/sec) |
NVLink 3 12 Links (600GB/sec) |
NVLink 2 4 Links (200GB/sec) |
NVLink 1 4 Links (160GB/sec) |
||
GPU | GA100 (826mm2) |
GA100 (826mm2) |
GV100 (815mm2) |
GP100 (610mm2) |
||
Transistor Count | 54.2B | 54.2B | 21.1B | 15.3B | ||
TDP | 250W | 400W | 250W | 300W | ||
Manufacturing Process | TSMC 7N | TSMC 7N | TSMC 12nm FFN | TSMC 16nm FinFET | ||
Interface | PCIe 4.0 | SXM4 | PCIe 3.0 | SXM | ||
Architecture | Ampere | Ampere | Volta | Pascal |
But as a result of the dual-slot add-in card type issue is designed for decrease TDP merchandise, providing much less room for cooling and usually much less entry to energy as nicely, the PCIe model of the A100 does need to ratchet down its TDP from 400W to 250W. That’s a large…