Home Update Hot Chips 2020 Live Blog: Baidu Kunlun AI Processor (4:30pm…

Update

Hot Chips 2020 Live Blog: Baidu Kunlun AI Processor (4:30pm…

August 25, 2020

393

07:29PM EDT – Last session of Hot Chips is all about ML inference. Starting with Baidu, and its Kunlun AI processor

07:30PM EDT – We’ve heard of Baidu’s Kunlun a number of months in the past resulting from a press launch from the corporate and Samsung stating that the silicon was making use of Interposer-Cube 2.5D packaging, in addition to HBM2, and packing 260 TOPs into 150 W.

07:32PM EDT – Baidu and Samsung construct the chip collectively

07:33PM EDT – Need a processor to cowl a diversified AI workflow

07:33PM EDT – NLP = Neural Language Processing

07:33PM EDT – All these techniques are precedence inside Baidu

07:34PM EDT – Traditional AI computing is carried out in Cloud, Datacenter, HPC, Smart Industry, Smart City

07:35PM EDT – High-end AI chips price loads to create

07:36PM EDT – Try to discover market quantity as a lot as doable

07:36PM EDT – The problem is the kind of compute

07:36PM EDT – Design and implementation

07:38PM EDT – Kunlun (Kun-loon)

07:38PM EDT – Need versatile, programmable, excessive efficiency

07:38PM EDT – Moved from FPGA to ASIC

07:39PM EDT – 256 TOPs in 2019

07:42PM EDT – (the presenter is a bit gradual fyi)

07:43PM EDT – Now some element

07:43PM EDT – Samsung Foundry 14nm

07:43PM EDT – Interposer package deal, 2 HBM, 512 GB/s

07:43PM EDT – PCIe 4.zero x8

07:43PM EDT – 150W / 256 TOPs

07:43PM EDT – PCIe card

07:44PM EDT – 256TOPs for INT8

07:44PM EDT – 16 GB HBM

07:44PM EDT – Passive cooling

07:45PM EDT – Same format as XPUv1 proven in HotChips 2017

07:45PM EDT – XPU cluster

07:45PM EDT – Software outlined neural community engine

07:45PM EDT – XPU-SDNN

07:46PM EDT – XPU-SDNN does tensor and vector

07:46PM EDT – XPU-Cluster does scalar and vector

07:46PM EDT – Each cluster has 16 tiny cores

07:46PM EDT – every unit has 16 MB on-chip reminiscence

07:47PM EDT – (what are the tiny cores?)

07:47PM EDT – Graph compiler

07:47PM EDT – helps PaddlePaddle, Tensorflow, pytorch

07:48PM EDT – XPU C/C++ for customized kernels

07:48PM EDT – 256 TOPs for 4096x4096x4096 GEMM INT8 inference

07:51PM EDT – These benchmarks are very odd

07:51PM EDT – massive edge = industrial

07:51PM EDT – Mask inspection

07:52PM EDT – Mask RCNN

07:52PM EDT – Available in Baidu Cloud

07:53PM EDT – Q&A time

07:54PM EDT – Q: {hardware} picture/video decode? A: No

07:55PM EDT – Q: INT4 throughput as INT8? A: INT4 identical as INT8, however INT4 and leverage extra of the capabilities

07:56PM EDT – Q: Size and BW of on-chip shared reminiscence? A: BW is 512 GB/s for every port every cluster (I do not assume that solutions the questions)

07:56PM EDT – Q: Static scheduling of sources? A: Yes

07:57PM EDT – Q:Power? A: Real Power 70-90W, virtually identical as T4, however TDP 150W

07:57PM EDT – That’s a wrap. Next discuss is Alibaba NPU

Source

Post Views: 455

Hot Chips 2020 Live Blog: Baidu Kunlun AI Processor (4:30pm…

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

Samsung Launches Next-Gen Odyssey Gaming Monitors That

China Limits Private Use of Facial Recognition on ‘Individua…

Daphni secures $215M for its third fund

POPULAR CATEGORY

RELATED ARTICLESMORE FROM AUTHOR

Trump places CHIPS Act subsidies on maintain, trade braces…

AI Chips Inspired By The Human Brain Are On The Rise

Intel, Microchip instances expose CHIPS Act challenges