NVIDIA has just announced its 2nd Generation DGX Station AI server based on the Ampere A100 Tensor Core GPUs. The DGX Station A100 comes in two configurations and features the updated A100 Tensor Core CPUs which pack double the memory & multi-Petaflops of AI horsepower at its disposal.
NVIDIA Unveils 2nd Generation DGX Station A100 AI Server – Now Packs Updated 80 GB A100 Tensor Core GPUs & Multi-Petaflops of Performance
The NVIDIA DGX Station A100 is aimed at the AI market, accelerating machine learning and data science performance for corporate offices, research facilities, labs, or home offices everywhere. According to NVIDIA, the DGX Station A100 is designed to be the fastest server in a box dedicated to AI research.
DGX Station Powers AI Innovation Organizations around the world have adopted DGX Station to power AI and data science across industries such as education, financial services, government, healthcare, and retail. These AI leaders include:
- BMW Group Production is using NVIDIA DGX Stations to explore insights faster as they develop and deploy AI models that improve operations.
- DFKI, the German Research Center for Artificial Intelligence, is using DGX Station to build models that tackle critical challenges for society and industry, including computer vision systems that help emergency services respond rapidly to natural disasters.
- Lockheed Martin is using DGX Station to develop AI models that use sensor data and service logs to predict the need for maintenance to improve manufacturing uptime, increase safety for workers, and reduce operational costs.
- NTT Docomo, Japan’s leading mobile operator with over 79 million subscribers, uses DGX Station to develop innovative AI-driven services such as its image recognition solution.
- Pacific Northwest National Laboratory is using NVIDIA DGX Stations to conduct federally funded research in support of national security. Focused on technological innovation in energy resiliency and national security, PNNL is a leading U.S. HPC center for scientific discovery, energy resilience, chemistry, Earth science, and data analytics.
NVIDIA DGX Station A100 System Specifications
Coming to the specifications, the NVIDIA DGX Station A100 is powered by a total of four A100 Tensor Core GPUs. These aren’t just any A100 GPUs as NVIDIA has updated the original specs, accomodating twice the memory.
The NVIDIA A100 Tensor Core GPUs in the DGX Station A100 comes packed with 80 GB of HBM2e memory which is twice the memory size of the original A100. This means that the DGX Station has a total of 320 GB of total available capacity while fully supporting MIG (Multi-Instance GPU protocol) and 3rd Gen NVLink support, offering 200 GB/s of bidirectional bandwidth between any GPU pair & 3 times faster interconnect speeds than PCIe Gen 4. The rest of the specs for the A100 Tensor Core GPUs remain the same.
The system itself houses an AMD EPYC Rome 64 Core CPU with full PCIe Gen 4 support, up to 512 GB of dedicated system memory, 1.92 TB NVME M.2 SSD storage for OS, and up to 7.68 TB NVME U.2 SSD storage for data cache. For connectivity, the system carries 2x 10 GbE LAN controllers, a single 1 GbE LAN port for remote management. Display output is provided through a discrete DGX Display Adapter card which offers 4 DisplayPort outputs with up to 4K resolution support. The AIC features its own active cooling solution.
Talking about the cooling solution, the DGX Station A100 houses the A100 GPUs on the rear side of the chassis. All four GPUs and the CPU are supplemented by a refrigerant cooling system which is whisper quiet and also maintenance free. The compressor for the cooler is located within the DGX chassis.
NVIDIA Ampere GA100 GPU Based Tesla A100 Specs:
NVIDIA Tesla Graphics Card | Tesla K40 (PCI-Express) |
Tesla M40 (PCI-Express) |
Tesla P100 (PCI-Express) |
Tesla P100 (SXM2) | Tesla V100 (SXM2) | Tesla V100S (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) |
---|---|---|---|---|---|---|---|---|
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) | GP100 (Pascal) | GV100 (Volta) | GV100 (Volta) | GA100 (Ampere) | GA100 (Ampere) |
Process Node | 28nm | 28nm | 16nm | 16nm | 12nm | 12nm | 7nm | 7nm |
Transistors | 7.1 Billion | 8 Billion | 15.3 Billion | 15.3 Billion | 21.1 Billion | 21.1 Billion | 54.2 Billion | 54.2 Billion |
GPU Die Size | 551 mm2 | 601 mm2 | 610 mm2 | 610 mm2 | 815mm2 | 815mm2 | 826mm2 | 826mm2 |
SMs | 15 | 24 | 56 | 56 | 80 | 80 | 108 | 108 |
TPCs | 15 | 24 | 28 | 28 | 40 | 40 | 54 | 54 |
FP32 CUDA Cores Per SM | 192 | 128 | 64 | 64 | 64 | 64 | 64 | 64 |
FP64 CUDA Cores / SM | 64 | 4 | 32 | 32 | 32 | 32 | 32 | 32 |
FP32 CUDA Cores | 2880 | 3072 | 3584 | 3584 | 5120 | 5120 | 6912 | 6912 |
FP64 CUDA Cores | 960 | 96 | 1792 | 1792 | 2560 | 2560 | 3456 | 3456 |
Tensor Cores | N/A | N/A | N/A | N/A | 640 | 640 | 432 | 432 |
Texture Units | 240 | 192 | 224 | 224 | 320 | 320 | 432 | 432 |
Boost Clock | 875 MHz | 1114 MHz | 1329MHz | 1480 MHz | 1530 MHz | 1601 MHz | 1410 MHz | 1410 MHz |
TOPs (DNN/AI) | N/A | N/A | N/A | N/A | 125 TOPs | 130 TOPs | 1248 TOPs 2496 TOPs with Sparsity |
1248 TOPs 2496 TOPs with Sparsity |
FP16 Compute | N/A | N/A | 18.7 TFLOPs | 21.2 TFLOPs | 30.4 TFLOPs | 32.8 TFLOPs | 312 TFLOPs 624 TFLOPs with Sparsity |
312 TFLOPs 624 TFLOPs with Sparsity |
FP32 Compute | 5.04 TFLOPs | 6.8 TFLOPs | 10.0 TFLOPs | 10.6 TFLOPs | 15.7 TFLOPs | 16.4 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) |
156 TFLOPs (19.5 TFLOPs standard) |
FP64 Compute | 1.68 TFLOPs | 0.2 TFLOPs | 4.7 TFLOPs | 5.30 TFLOPs | 7.80 TFLOPs | 8.2 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) |
19.5 TFLOPs (9.7 TFLOPs standard) |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 6144-bit HBM2e | 6144-bit HBM2e |
Memory Size | 12 GB GDDR5 @ 288 GB/s | 24 GB GDDR5 @ 288 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s |
16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 1134 GB/s | 40 GB HBM2 @ 1.6 TB/s | Up To 80 GB HBM2 @ 1.6 TB/s |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB | 4096 KB | 6144 KB | 6144 KB | 40960 KB | 40960 KB |
TDP | 235W | 250W | 250W | 300W | 300W | 250W | 400W | 250W |
NVIDIA DGX Station A100 System Performance
As for performance, the DGX Station A100 delivers 2.5 Petaflops of AI training power & 5 PetaOPS of INT8 inferencing horsepower. The DGX Station A100 is also the only workstation of its kind to support the MIG (Multi-Instance GPU) protocol, allowing users to slice up individual GPUs, allowing for simultaneous workloads to be executed faster and more efficiently.
Over the original DGX Station, the new version offers a 3.17x increase in Training performance, 4.35x increase in Inference performance, and 1.85x increase in HPC oriented workloads. NVIDIA has also updated its DGX A100 system to feature 80 GB A100 Tensor Core GPUs too. Those allow NVIDIA to gain 3 times faster training performance over the standard 320 GB DGX A100 system, 25% faster inference performance, and two times faster data analytics performance.
NVIDIA DGX Station A100 System Availability
NVIDIA has announced that the DGX Station A100 and NVIDIA DGX A100 640 GB systems will be available this quarter through NVIDIA’s partner network resellers worldwide. The company will also be offering an upgrade option for DGX A100 320 GB system owners to upgrade to the 640 GB DGX variant featuring eight 80 GB A100 Tensor Core GPUs. NVIDIA has not provided any information on the pricing of the systems yet.