Tenstorrent has unveiled its next-generation Wormhole processor for AI workloads that guarantees to supply first rate efficiency at a low worth. The firm at the moment affords two add-on PCIe playing cards carrying one or two Wormhole processors in addition to TT-LoudBox, and TT-QuietBox workstations aimed toward software program builders. The complete of right now’s launch is aimed toward builders moderately than those that will deploy the Wormhole boards for his or her business workloads.
“It is at all times rewarding to get extra of our merchandise into developer fingers. Releasing improvement programs with our Wormhole™ card helps builders scale up and work on multi-chip AI software program.” stated Jim Keller, CEO of Tenstorrent. “In addition to this launch, we’re excited that the tape-out and power-on for our second era, Blackhole, goes very nicely.”
Each Wormhole processor packs 72 Tensix cores (that includes 5 RISC-V cores supporting varied information codecs) with 108 MB of SRAM to ship 262 FP8 TFLOPS at 1 GHz at 160W thermal design energy. A single-chip Wormhole n150 card carries 12 GB of GDDR6 reminiscence that includes a 288 GB/s bandwidth.
Wormhole processors provide versatile scalability to satisfy the various wants of workloads. In a typical workstation setup with 4 Wormhole n300 playing cards, the processors can merge to operate as a single unit, showing as a unified, in depth community of Tensix cores to the software program. This configuration permits the accelerators to both work on the identical workload, be divided amongst 4 builders or run as much as eight distinct AI fashions concurrently. An important function of this scalability is that it operates natively with out the necessity for virtualization. In information middle environments, Wormhole processors will scale each inside one machine utilizing PCIe or outdoors of a single machine utilizing Ethernet.
From efficiency standpoint, Tenstorrent’s single-chip Wormhole n150 card (72 Tensix cores at 1 GHz, 108 MB SRAM, 12 GB GDDR6 at 288 GB/s) is able to 262 FP8 TFLOPS at 160W, whereas the dual-chip Wormhole n300 board (128 Tensix cores at 1 GHz, 192 MB SRAM, aggregated 24 GB GDDR6 at 576 GB/s) can provide as much as 466 FP8 TFLOPS at 300W (in accordance with Tom’s Hardware).
To put that 466 FP8 TFLOPS at 300W quantity into context, let’s examine it to what AI market chief Nvidia has to supply at this thermal design energy. Nvidia’s A100 doesn’t assist FP8, but it surely does assist INT8 and its peak efficiency is 624 TOPS (1,248 TOPS with sparsity). By distinction, Nvidia’s H100 helps FP8 and its peak efficiency is huge 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is an enormous distinction from Tenstorrent’s Wormhole n300.
There is an enormous catch although. Tenstorrent’s Wormhole n150 is obtainable for $999, whereas n300 is offered for $1,399. By distinction, one Nvidia H100 card can retail for $30,000, relying on portions. Of course, we have no idea whether or not 4 or eight Wormhole processors can certainly ship the efficiency of a single H300, although they are going to achieve this at 600W or 1200W TDP, respectively.
In addition to playing cards, Tenstorrent affords builders pre-built workstations with 4 n300 playing cards contained in the inexpensive Xeon-based TT-LoudBox with lively cooling and a premium EPYC-powered TT-QuietBox with liquid cooling.
Sources: Tenstorrent, Tom’s Hardware