Wednesday, August 17, 2022

Intel Habana Gaudi2 Purportedly Outperforms Nvidia’s A100

Must Read
fatima khan
fatima khan
A brand new writer in the fields, Fatima has been taken under my electric spark's RGB- rich and ensures she doesn't engage in excessive snark on the website. It's unclear what command and Conquer are; however, she can talk for hours about the odd rhythm games, hardware, product reviews, and MMOs that were popular in the 2000s. Fatima has been creating various announcements, previews, and other content while here, but particularly enjoys writing regarding Products' latest news in the market she's currently addicted to. She is likely talking to an additional blogger with her current obsession right now.

Intel on Wednesday published performance results of its Habana Labs Gaudi2 deep learning processor in MLPerf, a leading DL benchmark. The 2nd Generation Gaudi processor outperforms its main currently available competitor — Nvidia‘s A100 compute GPU with 80GB of HBM2E memory — by up to 3 times in terms of time-to-train metrics. While Intel’s publication does not show how the Gaudi2 performs Against Nvidia‘s H100 GPU, it describes some of Intel’s own performance targets for the next-generation of chips. 

“For ResNet-50 Gaudi 2 shows a dramatic reduction in time-to-train of 36% vs. Nvidia’s submission for A100-80GB and 45% reduction compared to Dell’s submission cited for an A100-40GB 8-accelerator server that was submitted for both ResNet-50 and BERT results,” an Intel statement reads. 

3X Performance Uplift vs Gaudi

Before jumping right to performance results of Intel’s Habana Gaudi2, let us quickly recapture what Gaudi actually is. The Gaudi processor is a heterogenous system-on-chip that packs a Matrix Multiplication Engine (MME) and a programmable Tensor Processor Core (TPC, each core is essentially a 256-bit VLIW SIMD general-purpose processor) cluster capable of processing data in FP32, TF32, BF16, FP16, and FP8 formats (FP8 is only supported on Gaudi2). In addition, Gaudi has its own media engines to process both video and audio data, something that is crucially important for vision procession.

(Image credit: Intel)

While the original Habana Gaudi was made using one of TSMC’s N16 fabrication processes, the new Gaudi2 is produced on an N7 node, which allowed Intel to boost the number of TPCs from 8 to 24 as well as add support for the FP8 data format. Increasing the number of execution units and memory performance can triple performance compared to that of the original Gaudi, but there may be other sources for the horsepower increase. On the other hand, there may be other limitations (e.g., thread dispatcher for VLIW cores, memory subsystem bandwidth, software scalability, etc.)

- Advertisement -spot_img


Please enter your comment!
Please enter your name here

Latest News

Android 13 arrives for Pixel phones starting today

This year’s major Android update, Android 13, is officially releasing today for Google’s Pixel phones, the search giant has...

More Articles Like This