Header Ads

Breaking News

AMD EPYC Genoa & Radeon Instinct To Power El Capitan Supercomputer


AMD has just announced a major win in the HPC sector with its next-generation EPYC and Radeon accelerators powering the 2 Exaflop El Capitan supercomputer of the U.S. Department of Energy or DOE which should be operational by 2023.

AMD's EPYC Genoa & Radeon Instinct HPC Accelerators To Drive 2 Exaflop 'El Capitan' Supercomputer

All three giants including Intel, AMD & NVIDIA were competing to win the contract for DOE's latest supercomputer but it looks like AMD won on both the CPU & GPU front. The El Capitan would be built by HPE's Cray supercomputing division which would utilize the next-generation accelerators from AMD to bring this exaflop monster to life by 2023. The supercomputer would be deployed at the Lawrence Livermore National Laboratory and would be able to perform up to 2 quintillion calculations per second.

SAPPHIRE Announces A Family Of Compact Motherboards Powered By AMD Ryzen Embedded Chips

"We expect when it's delivered to the laboratory in 2023, it will be the fastest supercomputer in the world," said Bill Goldstein, director of the Livermore lab

The entire system would cost $600 million to assemble and would be at least 16 times faster than the Sierra (IBM Power 9 + NVIDIA Volta) supercomputer that is currently deployed at LLNL. As for the specifications of the system itself, we know that EPYC Genoa would be powering the CPU side while a next-generation Radeon Instinct accelerator would power the GPU side of things. The whole system would consume less than 40MW when it becomes operational.

The following are a list of AMD technologies to be included in the El Capitan supercomputer:

  • Next-generation AMD EPYC processors, codenamed “Genoa” featuring the “Zen 4” processor core. These processors will support next-generation memory and I/O subsystems for AI and HPC workloads,
  • Next-generation Radeon Instinct GPUs based on a new compute-optimized architecture for workloads including HPC and AI. These GPUs will use the next- generation high bandwidth memory and are designed for optimum deep learning performance,
  • The 3rd Gen AMD Infinity Architecture, which will provide a high-bandwidth, low latency connection between the four Radeon Instinct GPUs and one AMD EPYC CPU included in each node of El Capitan. As well, the 3rd Gen AMD Infinity Architecture includes unified memory across the CPU and GPU, easing programmer access to accelerated computing,
  • An enhanced version of the open-source ROCm heterogenous programming environment, being developed to tap into the combined performance of AMD CPUs and GPUs, unlocking maximum performance.

AMD EPYC Genoa - Post-7nm Zen 4 Cores, SP5 Socket Platform, DDR5 Memory, PCIe 5.0 Protocol

The AMD EPYC Genoa processors based on the Zen 4 core architecture were a mystery until AMD officially unveiled them in their latest roadmap during the EPYC Rome launch. Currently in-design with a planned launch fo 2021, the Genoa lineup would bring a brand new set of features to the server landscape.

AMD Revives Polaris In Asian Regions, Radeon RX 590 GME With 2304 Cores, 8 GB Memory But Lower Clocks

AMD announced that EPYC Genoa would be compatible with the new SP5 platform which brings a new socket so SP3 compatibility would exist up till EPYC Milan. The EPYC Genoa processors would also feature support for new memory and new capabilities. It looks like AMD would definitely be jumping on board the DDR5 bandwagon in 2021. It is also stated that new capabilities would be introduced on EPYC Genoa which sounds like a hint at the new PCIe 5.0 protocol which would double the bandwidth of PCIe 4.0, offering 128 Gbps link speeds across an x16 interface.

Summing everything up for EPYC Genoa, we are looking at the following main features:

  • Post-7nm Zen 4 cores
  • SP5 Platform With New Socket
  • PCIe 5.0 Support
  • DDR5 Memory Support
  • Launch in 2021

On the Radeon Instinct side, we are definitely looking at a much powerful and possibly sub-7nm GPU based graphics accelerator. AMD is currently prepping to launch its Radeon Instinct Mi100 accelerator which is codenamed 'Arcturus' and reportedly features up to 8192 stream processors and 32 GB of HBM2e memory.

The GPU is definitely a beast on its own but it's planned for a 2020 launch and to make sure that they remain future proof, the El Capitan supercomputer would be definitely deploying something newer than the Radeon Instinct Mi100. The exact graphics card or accelerator has not been mentioned but it is stated that the new GPU offers a brand new compute architecture which is:

  • Optimized for HPC and AI
  • Extensive Mixed Precision Ops for Optimized Deep Learning Performance
  • Next-Generation High Bandwidth Memory
  • Maximize Performance With Multi-GPU Scaling

Based on the feature set, we are definitely looking at something beyond HBM2e and PCIe Gen 4 which will be readily available by 2021 while the El Capitan would become operational in 2023. The GPU is also said to be specifically designed for Compute / AI / HPC workloads which means that it would be a custom design for the said segment and not a chip that you would get to see in the consumer space, much like NVIDIA's own HPC accelerators.

Also, the third major feature of El Capitan would be that each AMD CPU / GPU accelerator would be equipped with the 3rd Generation Infinity Fabric interconnect. Mentioned as Infinity Fabric 3.0, the new interconnect would allow a high-bandwidth & low-latency connection between the CPU & GPU, allow a unified memory across CPU & GPU while the whole coherent nature of the platform would improve overall performance and simplify programming.

In the slide posted by AMD, it looks like each node would have four Radeon accelerators directly linked to an AMD EPYC Genoa processor through Infinity Fabric 3.0. It would be used in addition to Cray's own Slingshot fabric which currently pushes up to 200 Gb/s of bandwidth but future version could offer even more interconnect bandwidth to the El Capitan infrastructure. The difference here is that Slingshot is more of a node-to-node channel while Infinity Fabric is a closer CPU-GPU interconnect.

It's also mentioned that there would be cache coherency between the CPU & GPU aside from just memory which would be a big deal for future HPC platforms. A slide showcasing advantages of a heterogeneous platform was showcased by AMD and shared by Addison Snell over at his Twitter feed which gives us a good idea of what to expect from the future compute & acceleration platforms. With that said, AMD has its Financial Analyst Day tomorrow and we can definitely expect more details on Zen 4 and the said Radeon Instinct accelerators during the presentation.

Submit


Source link

No comments