Over at the Hot Chips symposium Nvidia has been sharing a thing or two with media. Quite interesting is a new photo series of the GP100. Now do not confuse GP100 with the GP102 that is housed in the Nvidia Titan X (Pascal).
The GP100 is the GPU that houses HBM2 is referred to as BIG Pascal. Ryan over at Anandtech posted an actual die shot of the graphics processor, and man that thing is just HUGE as it is 610mm2. Big Pascal made on TSMC’s 16 nanometer silicon fab process.
Ya’ll ready? here it is:
GP100 in particular comes with stacked HBM2 (16GB in four stacks). The Pascal based GPU driving the unit holds 15 Billion transistors which is roughly double that of the current biggest Maxwell chip. Gp100 is huge at 600mm^2. The prognosis performance (according to Nvidia) is 5.3TFLOPS using 64-bit floating-point numbers and is rated at 10.6TFLOPS using 32-bit and 21.2TFLOPS using 16-bit. P100 has 4MB of L2 cache and 14MB of shared memory for just the register file. The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators, however I added the GP100 as a fully enabled product:
Tesla Products | Tesla K40 | Tesla M40 | Tesla P100 | GP100 |
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) | GP100 (Pascal) |
SMs | 15 | 24 | 56 | 60 |
TPCs | 15 | 24 | 28 | 30 |
FP32 CUDA Cores / SM | 192 | 128 | 64 | 64 |
FP32 CUDA Cores / GPU | 2880 | 3072 | 3584 | 3840 |
FP64 CUDA Cores / SM | 64 | 4 | 32 | 32 |
FP64 CUDA Cores / GPU | 960 | 96 | 1792 | 1920 |
Base Clock | 745 MHz | 948 MHz | 1328 MHz | ~1328 MHz |
GPU Boost Clock | 810/875 MHz | 1114 MHz | 1480 MHz | ~1480 MHz |
Texture Units | 240 | 192 | 224 | 240 |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 | 4096-bit HBM2 |
Memory Size | Up to 12 GB | Up to 24 GB | 16 GB | 16 GB |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB | 4096 KB |
Register File Size / SM | 256 KB | 256 KB | 256 KB | 256 KB |
Register File Size / GPU | 3840 KB | 6144 KB | 14336 KB | 14336 KB |
TDP | 235 Watts | 250 Watts | 300 Watts | ~300 Watts |
Transistors | 7.1 billion | 8 billion | 15.3 billion | 15.3 billion |
GPU Die Size | 551 mm² | 601 mm² | 610 mm² | 610 mm² |
Manufacturing Process | 28-nm | 28-nm | 16-nm | 16-nm |
As the block diagram now shows, the GP100 features six graphics processing clusters (GPCs). Just look at the diagram and count along with me – each GPC holds 10 streaming multiprocessors (SMs) and then each SM has 64 CUDA cores and four texture units. Do the math and you’ll reach 640 shader processors per GPC and 3840 shader cores with 240 texture units in total.
- 6 (GPC) x (10×64) = 3840 Shader processor units in total.
Meaning the GP100 used on the Tesla P100 is not fully enabled. Nvidia is known to out GPU that have disabled segments, it helps them selling different SKUs, the Tesla P100 holds a shader count of 3584 and thus has 56 SMs enabled (from the 60).
Soem additional photos can be spotted below including an actual photo of the GP100 silicon. Click the thumbnails to enlarge.