This year, LLNL prepares to launch world’s fastest supercomputer

In October 2023, El Capitan’s doors were installed for an Open House to show visitors what the supercomputer will look like when fully installed. Photo by Garry McLeod/LLNL.

Lines of towering black cabinets, intertwined with diverse-colored wires and tubes, bordered the computing area. 

A transparent panel separated us. That and another safety door, cautioning of the commotion beyond.

Upon being opened, the buzzing was noticeable.

It was the middle of January and the second level of LLNL’s Computing Center, Building 453, was vibrant with sound.

Contributing to the noise was El Capitan, LLNL’s upcoming supercomputer, which was undergoing some equipment testing amidst operational systems on the floor. 

El Capitan is predicted to be the globe’s swiftest supercomputer, or high-performance computing system, when it goes live this year in late summer or early autumn.

The setup will empower researchers from the National Nuclear Security Administration weapons design laboratories to construct models and execute simulations, previously deemed challenging, time-consuming, or unfeasible, for the upkeep and modernization of the United States’ nuclear weapons stockpile.

The nuclear safety exploration is crucial to guarantee the reliability of the stockpile, ultimately to uphold nuclear deterrence in the lack of underground nuclear testing, as per lab officials.

In 2019, the U.S. Department of Energy’s NNSA inked a $600 million agreement with Cray Inc. (acquired by Hewlett Packard Enterprise), a developer, manufacturer, and maintainer of supercomputers, to craft the NNSA’s initial exascale supercomputer (the DOE’s third exascale-class supercomputer): El Capitan.

Delivery of El Capitan’s components commenced in summer 2023. Seen here are some of the supercomputer’s initial cabinets to be installed, 2023. Photo by Garry McLeod/LLNL. Credit: Garry McLeod

The supercomputer is estimated to function at over 2 exaFLOPs – 2 quintillion (10^18) floating point operations per second – at peak performance, as outlined by Jeremy Thomas, public information officer at LLNL. This implies it’s expected to process calculations, like addition and multiplication, at 2 billion billion calculations per second. 

To place this velocity in context, an exascale machine functions approximately 1 million times quicker (or more) than the average household system, as stated by Thomas. When compared to other supercomputers, it’s projected to operate 10-15 times faster than LLNL’s quickest supercomputer, Sierra, which operates at 125 petaFLOPS (1 quadrillion floating point operations per second) at peak performance, in accordance with Thomas.

Researchers from the NNSA weapons design laboratories – LLNL, Los Alamos National Laboratory, and Sandia National Laboratories – will leverage El Capitan’s power to devise weapons, components, and delivery systems to fulfill the evolving needs of nuclear deterrence, detailed Rob Neely, leader of the advanced simulation and computing program at LLNL. Scientists will also have the ability to simulate and conduct calculations on existing weapons that are surpassing their intended lifespan, to ensure their safety and effectiveness. 

The supercomputer’s objective is in line with the NNSA’s Stockpile Stewardship and Management Program, which aims to uphold and modernize the nuclear stockpile, according to the NNSA’s website. 

El Capitan will empower scientists to merge high resolution (precise), 3D modeling, and ensemble calculations (multiple calculations with slight deviation that enable researchers to grasp the simulation’s sensitivity to uncertainties like environmental conditions and modeling errors), as detailed by Neely.

“Previous machines have allowed us to explore one or perhaps two of those dimensions at a time, which has been greatly valuable and aids us in continuing to advance the science. But El Capitan is where we intend to bring it all together for the initial time – the climax of 30 years of effort,” penned Neely.

El Capitan is set to achieve unmatched speed thanks to its AMD MI300A APUs (accelerated processing units) and high-speed network connections.

AMD’s MI300A APU architecture encompasses coupled GPUs (graphics processing units) and CPUs (central processing units) that share memory. The memory-sharing eliminates the necessity to transfer or copy data between processors, hastening the system’s processing, according to Neely. 

These APUs are paired on compute nodes, which concurrently operate different portions of a task. A high-performance network, comprised of miles of cabling, links these nodes, allowing them to communicate at superlative speeds, according to Neely.

A contractor works on a rack of El Capitan. The racks will hold compute blades, which contain the AMD MI300A APUs, Oct. 2023. Photo by Garry McLeod/LLNL.

El Capitan’s GPU-centric architecture will render it optimal for investigating AI methods, though it was not initially tailored to do so, Neely mentioned. 

“AI is this novel and emerging approach to computers where the computers actually learn, not from a human sitting down and dictating exactly what to do, but instead by giving it copious data or examples and it leveraging these intricate networks to learn how to do something,” Neely expressed. 

He presumes initial exploration on El Capitan will follow a hybrid approach: utilizing AI techniques to scrutinize the outcomes of a traditional simulation and utilizing machine learning (a variant of AI) to enhance models, in a blend labeled cognitive simulation. Over time, he anticipates witnessing the workload increasingly capitalize on AI, as its dependability becomes evident.

Getting to this stage has been an endeavor, encompassing a $100 million Exascale Computing Facility Modernization project at LLNL to boost energy and water supply to the computing center.

Once El Capitan goes live, it will be in an open research phase for assembly, testing, and work (like fusion simulation) for five to six months before transitioning into a classified system, as per Neely.

Until then, El Capitan awaits delivery of compute blades (which house the AMD MI300A APUs and memory); installation of the remaining equipment; testing of hardware, software, and system and assimilation of the final system, as noted by Terri Quinn, deputy associate director for high performance computing at LLNL.

The mid-January visit to the supercomputer made its imminent existence evident. From the rush of fluid coursing through the supercomputer’s cooling system to the network cables threaded through its cabinets, El Capitan was well on its path to deployment.

LLNL officials communicated that after El Capitan is deployed later this year and undergoes assurance testing, dignitaries, local elected officials, and high-level representatives from DOE/NNSA, HPE, AMD, and others will mark the supercomputer’s celebration in a dedication and ribbon-cutting event.

Computing, , , Leave a Comment on This year, LLNL prepares to launch world’s fastest supercomputer

Leave a Reply

Your email address will not be published. Required fields are marked *