Home Update Hot Chips 2021 Live Blog: Machine Learning (Esperanto,…

Hot Chips 2021 Live Blog: Machine Learning (Esperanto,…

325
Hot Chips 2021 Live Blog: Machine Learning (Esperanto,...


11:08AM EDT – Welcome to Hot Chips! This is the annual convention all in regards to the newest, best, and upcoming massive silicon that will get us all excited. Stay tuned throughout Monday and Tuesday for our common AnandTech Live Blogs.

11:08AM EDT – Event begins at 8:30am PT, so in about 22 minutes

11:25AM EDT – Starting right here in about 5 minutes

11:30AM EDT – First up is a chat from Esperanto Technologies

11:31AM EDT – AI Accelerator – 1000 RISC-V cores on a chip

11:32AM EDT – 1088 RISC-V cores

11:32AM EDT – ET-Minion with tensor models

11:33AM EDT – 160 million bytes of SRAM onboard

11:33AM EDT – PCIe x8 Gen 4

11:33AM EDT – Up to 200 Tera-Ops

11:33AM EDT – Under 20 watts for inference

11:33AM EDT – concentrate on suggestion fashions

11:34AM EDT – historically run on x86

11:34AM EDT – these servers want add-in playing cards

11:34AM EDT – Low energy price range per card

11:34AM EDT – Multiple knowledge kind help

11:34AM EDT – dense and sparse workloads

11:34AM EDT – be programmable

11:35AM EDT – cut back off-die reminiscence references

11:36AM EDT – Fixed operate {hardware} can shortly change into out of date

11:37AM EDT – hundreds of threads

11:38AM EDT – restricted parallelism with single massive chips

11:38AM EDT – 1000s of RISC-V cores in esperanto

11:38AM EDT – Large chips have massive energy

11:38AM EDT – Esperanto splits it throughout chips

11:38AM EDT – permits for decrease voltage, growing efciciency

11:38AM EDT – Highest suggestion efficiency inside 120W in six chips

11:40AM EDT – TSMC 7nm FinFET

11:40AM EDT – drive down voltage per core

11:40AM EDT – C dynamic is tough

11:41AM EDT – Efficiency vs voltage – 0.34 is finest

11:42AM EDT – Inferences per second per watt

11:42AM EDT – One chip may use 275W at peak

11:42AM EDT – 0.75 volts is 164W per chip

11:43AM EDT – Best environment friendly level is at 8.5 W – 2.5x higher perf than at 0.9 volts

11:44AM EDT – 64-bit risc-v processor, software program configurable l1 knowledge cache

11:44AM EDT – so as pipeline

11:44AM EDT – SMT2

11:45AM EDT – 300 MHz to 2 GHz

11:45AM EDT – can do 64 ops on one tensor instruction

11:45AM EDT – 64okay ops

11:45AM EDT – 512-bit extensive integer per cycle, 256-bit extensive FP per cycle, per core

11:46AM EDT – Eight cores on a chip type a neighborhood

11:46AM EDT – earlier than extensive size grew to become an issue

11:46AM EDT – Eight minions share a single massive instruction cache

11:46AM EDT – way more environment friendly than having every core with its personal I-cache

11:47AM EDT – cooperative masses

11:47AM EDT – customized directions

11:47AM EDT – Four neighborhoods makes a shire

11:47AM EDT – with Four MB of shared SRAM

11:48AM EDT – mesh interconnect on every shire

11:48AM EDT – SRAM banks could possibly be partitioned as personal L2 or shared L3

11:48AM EDT – Meshes run over the cores

11:48AM EDT – 16 LPDDR4X controllers

11:49AM EDT – 256-bit extensive LPDDR4X

11:49AM EDT – Six chips and 24 LPDDR4 chips on a PCIe card with a PCIe change

11:49AM EDT – 192 GB of accelerator reminiscence

11:49AM EDT – 822 GB/s complete reminiscence bandwidth per PCIe card

11:50AM EDT – OCP variations

11:50AM EDT – How to deploy at scale

11:50AM EDT – 6 chips have a single heatspreader

11:51AM EDT – Software by means of many interfaces

11:52AM EDT – Esperanto projected efficiency

11:54AM EDT – Four high-performance ET-Maxions

11:54AM EDT – Full RV64GC ISA

11:54AM EDT – 24 billion transistors, 570mm2, 89 masks layers

11:54AM EDT – First silicon in carry up

11:55AM EDT – A0 silicon in take a look at

11:55AM EDT – Highest efficiency industrial RISC-V chip thus far

11:55AM EDT – Early Access for certified prospects later in 2021

11:56AM EDT – Q*A time

11:58AM EDT – Q: External reminiscence and IO energy add above 20W – A: IOs are included. 20W contains DRAM and different parts

12:00PM EDT – Q: Why not BF16? A: Natively it does, however BF16 could be expanded FP32 for compute and put to BF16 again in storage. Because we do inference – buyer desires inference, would not want BF16

12:01PM EDT – Q: Data cache…



Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here