11:08AM EDT – Welcome to Hot Chips! This is the annual convention all in regards to the newest, best, and upcoming massive silicon that will get us all excited. Stay tuned throughout Monday and Tuesday for our common AnandTech Live Blogs.
11:08AM EDT – Event begins at 8:30am PT, so in about 22 minutes
11:25AM EDT – Starting right here in about 5 minutes
11:30AM EDT – First up is a chat from Esperanto Technologies
11:31AM EDT – AI Accelerator – 1000 RISC-V cores on a chip
11:32AM EDT – 1088 RISC-V cores
11:32AM EDT – ET-Minion with tensor models
11:33AM EDT – 160 million bytes of SRAM onboard
11:33AM EDT – PCIe x8 Gen 4
11:33AM EDT – Up to 200 Tera-Ops
11:33AM EDT – Under 20 watts for inference
11:33AM EDT – concentrate on suggestion fashions
11:34AM EDT – historically run on x86
11:34AM EDT – these servers want add-in playing cards
11:34AM EDT – Low energy price range per card
11:34AM EDT – Multiple knowledge kind help
11:34AM EDT – dense and sparse workloads
11:34AM EDT – be programmable
11:35AM EDT – cut back off-die reminiscence references
11:36AM EDT – Fixed operate {hardware} can shortly change into out of date
11:37AM EDT – hundreds of threads
11:38AM EDT – restricted parallelism with single massive chips
11:38AM EDT – 1000s of RISC-V cores in esperanto
11:38AM EDT – Large chips have massive energy
11:38AM EDT – Esperanto splits it throughout chips
11:38AM EDT – permits for decrease voltage, growing efciciency
11:38AM EDT – Highest suggestion efficiency inside 120W in six chips
11:40AM EDT – TSMC 7nm FinFET
11:40AM EDT – drive down voltage per core
11:40AM EDT – C dynamic is tough
11:41AM EDT – Efficiency vs voltage – 0.34 is finest
11:42AM EDT – Inferences per second per watt
11:42AM EDT – One chip may use 275W at peak
11:42AM EDT – 0.75 volts is 164W per chip
11:43AM EDT – Best environment friendly level is at 8.5 W – 2.5x higher perf than at 0.9 volts
11:44AM EDT – 64-bit risc-v processor, software program configurable l1 knowledge cache
11:44AM EDT – so as pipeline
11:44AM EDT – SMT2
11:45AM EDT – 300 MHz to 2 GHz
11:45AM EDT – can do 64 ops on one tensor instruction
11:45AM EDT – 64okay ops
11:45AM EDT – 512-bit extensive integer per cycle, 256-bit extensive FP per cycle, per core
11:46AM EDT – Eight cores on a chip type a neighborhood
11:46AM EDT – earlier than extensive size grew to become an issue
11:46AM EDT – Eight minions share a single massive instruction cache
11:46AM EDT – way more environment friendly than having every core with its personal I-cache
11:47AM EDT – cooperative masses
11:47AM EDT – customized directions
11:47AM EDT – Four neighborhoods makes a shire
11:47AM EDT – with Four MB of shared SRAM
11:48AM EDT – mesh interconnect on every shire
11:48AM EDT – SRAM banks could possibly be partitioned as personal L2 or shared L3
11:48AM EDT – Meshes run over the cores
11:48AM EDT – 16 LPDDR4X controllers
11:49AM EDT – 256-bit extensive LPDDR4X
11:49AM EDT – Six chips and 24 LPDDR4 chips on a PCIe card with a PCIe change
11:49AM EDT – 192 GB of accelerator reminiscence
11:49AM EDT – 822 GB/s complete reminiscence bandwidth per PCIe card
11:50AM EDT – OCP variations
11:50AM EDT – How to deploy at scale
11:50AM EDT – 6 chips have a single heatspreader
11:51AM EDT – Software by means of many interfaces
11:52AM EDT – Esperanto projected efficiency
11:54AM EDT – Four high-performance ET-Maxions
11:54AM EDT – Full RV64GC ISA
11:54AM EDT – 24 billion transistors, 570mm2, 89 masks layers
11:54AM EDT – First silicon in carry up
11:55AM EDT – A0 silicon in take a look at
11:55AM EDT – Highest efficiency industrial RISC-V chip thus far
11:55AM EDT – Early Access for certified prospects later in 2021
11:56AM EDT – Q*A time
11:58AM EDT – Q: External reminiscence and IO energy add above 20W – A: IOs are included. 20W contains DRAM and different parts
12:00PM EDT – Q: Why not BF16? A: Natively it does, however BF16 could be expanded FP32 for compute and put to BF16 again in storage. Because we do inference – buyer desires inference, would not want BF16
12:01PM EDT – Q: Data cache…