NVIDIA has just posted the first real performance numbers of its Ampere A100 GPU and the results are insane. The company has broken a total of 16 performance records in AI-specific benchmarks & also beaten its main competitors in the specific machine learning performance category by a huge lead.
NVIDIA Ampere A100 GPU Breaks 16 AI World Records, Up To 4.2x Faster Than Volta V100
The results come in from MLPerf which is an industry benchmarking group formed back in 2018 with a focus purely on Machine Learning performance. The benchmark suite consists of a total of eight tests and NVIDIA has posted records in all with record training speeds.
This is the third consecutive and strongest showing for NVIDIA in training tests from MLPerf, an industry benchmarking group formed in May 2018. NVIDIA set six records in the first MLPerf training benchmarks in December 2018 and eight in July 2019.
NVIDIA was the only company to field commercially available products for all the tests. Most other submissions used the preview category for products that may not be available for several months or the research category for products not expected to be available for some time.
NVIDIA Blogs
NVIDIA also reported eight additional records with its DGX SuperPOD system which is a massive cluster of DGX A100 HPC systems connected together through HDR InfiniBand. The DGX SuperPod consists of 140 DGX A100 systems with a total of 1,120 NVIDIA Ampere A100 GPUs, 170 Mellanox Quantum 200G Infiniband switches, 4 PB of storage and 15km of optical cable.
That’s around 7.7 million Ampere CUDA cores inside the DGX SuperPod system which is mind-blowing. The system is part of the DGX V expansion plan, adding nearly 700 Petaflops of computing horsepower to the system which is currently deployed at NVIDIA’s HQ in Santa Clara, California.
The AI Performance Benchmarks – Ampere vs Volta & More
NVIDIA has compared their Ampere A100 Tensor Core GPU accelerator to its predecessor, the Volta V100. The comparison also includes Google’s 3rd Generation TPU and Huawei’s Ascend HPC chips. MLPerf themselves have more detailed benchmarks listed and also include a preview of upcoming AI accelerators such as Intel’s Cooper Lake-SP Xeon CPUs and Google’s 4th Gen TPU. With that said, let’s take a look at the benchmarks themselves.
According to MLPerf, their benchmark suite includes tests that target the performance workloads that are most relevant in the machine learning and AI categories. The NVIDIA Ampere A100 simply destroys the Volta V100 with a performance speed up by a factor of 2.5x. Even at its minimum lead, the Ampere A100 delivers a 50% boost over the Volta V100 GPU which is impressive. The chip-scale here was normalized to a single GPU to deliver a fair comparison between Ampere and Volta.
The Huawei Ascend chip was able to finish just one test in time and that too with poor performance than the Volta V100 while Google’s TPU V3 only managed to complete two tests in time. In one test, the chip secured a 20% lead over NVIDIA Volta V100 while in the second test, it was 10% slower than the V100.
Compared to the Cooper Lake-SP 8-socket configuration which completes the image classification test in 1104.53 minutes, a dual NVIDIA A100 system can complete the same test in just 33.37 mins. NVIDIA also goes ahead to compare the performance of its Ampere A100 to the unreleased Google TPU V4 which is still in the research stage and at least a year away from availability.
NVIDIA also shows how the performance of their GPU accelerators has improved over time with the latest full-stack innovations for AI. Compared to MLPerf 0.5 running on Volta V100, the MLPerf 0.7 suite running with Ampere A100 delivers a mind-blowing 4.2x performance gain.
This goes off to show just how impressive of a chip the NVIDIA Ampere A100 GPU is in real-world benchmarks within a suite that is recognized by all major players in the AI community. The Ampere A100 GPU was also regarded as the fastest GPU ever recorded in another benchmark even when compared to the Turing GPU which had hw-accelerated techniques enabled to deliver better perf but still couldn’t match the Ampere A100 and its huge performance output. All these benchmark capabilities make us all the more excited to see Ampere in consumer form which should definitely be happening a few months from now.