By Ravindranath C Kanakarajan, Senior Principal Engineer, Switch BU
Marvell has been actively involved with SONiC since its beginning, with many SONiC switches powered by Marvell® ASICs at hyperscalers deployed worldwide. One of Marvell's goal has been to enhance SONiC to address common issues and optimize its performance for large-scale deployments.
The Challenge
Many hackathon projects have focused on improving the monitoring, troubleshooting, debuggability, and testing of SONiC. However, we believe one of the core roles of a network operating system (NOS) is to optimize the use of the hardware data plane (i.e., the NPUs and networking ASICs). As workloads become increasingly more demanding, it becomes crucial to maximize the efficiency of the data plane. Commercial black-box NOS are tailored to specific NPUs/ASICs to achieve optimal performance. SONiC, however, supports a diverse range of NPUs/ASICs, presenting a unique challenge.
We at Marvell have been contributing features to SONiC to ensure optimal use of the underlying networking ASIC resources. Over time, we’ve recognized the need to provide operators with flexibility in utilizing ASIC resources while reducing the platform-specific complexity gradually being introduced into SONiC’s core component, the Orchagent. This approach will help SONiC operators to maintain consistent device configurations even when using devices from different platform vendors.
BYOC
During the Hackathon, we developed a framework called “BYOC: Bring Your Own Configuration,” allowing networking ASIC vendors to expose their hardware capabilities in a file describing intent. A new agent transforms the user’s configuration into an optimal SONiC configuration based on the capabilities file. This approach allows ASIC vendors to ensure that user configurations are converted to optimal ASIC configurations. It also allows SONiC operators to fine-tune the hardware resources consumed based on the deployment needs. It further helps in optimally migrating configurations from vendor NOS to SONiC based on the SONiC platform’s capability.
By Michael Kanellos, Head of Influencer Relations, Marvell
What happened in semis and accelerated infrastructure in 2024? Here is the recap:
1. Custom Controls the Future
Until relatively recently, computing performance was achieved by increasing transistor density à la Moore’s Law. In the future, it will be achieved through innovative design, and many of those innovative design ideas will come to market first—and mostly— through custom processors tailored to use cases, software environments and performance goals thanks to a convergence of unusual and unstoppable forces1 that quietly began years ago.
FB NIC on display at OFC
By Michael Kanellos, Head of Influencer Relations, Marvell
The idea of customizing high bandwidth memory (HBM) has only recently emerged, but expect to see it in the mainstream in just a few years.
“We strongly believe that custom HBM will be the majority portion of the market towards the ’27-28 time frame,” said In Dong Kim, vice president of product planning at Samsung Semiconductor in a video interview with the Six Five at Marvell Analyst Day earlier this month where Marvell, Micron Technology, Samsung and SK hynix announced a collaboration to accelerate the development of custom HBM solutions.
Sunny Kang, vice president of DRAM technology at SK Hynix had a similar outlook. “Usually in the DRAM industry, when we launch a new product, it takes just one or two years to be mainstream,” he said. “That means along the ‘29 timeframe, it is going to be a mainstream product in the HBM market. I’m pretty sure about that.”
By Michael Kanellos, Head of Influencer Relations, Marvell
Data infrastructure needs more: more capacity, speed, efficiency, bandwidth and, ultimately, more data centers. The number of data centers owned by the top four cloud operators has grown by 73% since 20201, while total worldwide data center capacity is expected to double to 79 megawatts (MW) in the near future2.
Aquila, the industry’s first O-band coherent DSP, marks a new chapter in optical technology. O-band optics lower the power consumption and complexity of optical modules for links ranging from two to 20 kilometers. O-band modules are longer in reach than PAM4-based optical modules used inside data centers and shorter than C-band and L-band coherent modules. They provide users with an optimized solution for the growing number of data center campuses emerging to manage the expected AI data traffic.
Take a deep dive into our O-band technology with Xi Wang’s blog, O-Band Coherent, An Idea Whose Time is (Nearly) Here, originally published in March, below:
O-Band Coherent: An Idea Whose Time Is (Nearly) Here
By Xi Wang, Vice President of Product Marketing of Optical Connectivity, Marvell
Over the last 20 years, data rates for optical technology have climbed 1000x while power per bit has declined by 100x, a stunning trajectory that in many ways paved the way for the cloud, mobile Internet and streaming media.
AI represents the next inflection point in bandwidth demand. Servers powered by AI accelerators and GPUs have far greater bandwidth needs than typical cloud servers: seven high-end GPUs alone can max out a switch that ordinarily can handle 500 cloud two-processor servers. Just as important, demand for AI services, and higher-value AI services such as medical imaging or predictive maintenance, will further drive the need for more bandwidth. The AI market alone is expected to reach $407 billion by 2027.
By Michael Kanellos, Head of Influencer Relations, Marvell
How do you get more data to the processor faster?
That has been the central question for computing architects and chip designers since the dawn of the computer age. And it’s taken on even greater urgency with AI. The greater amount of data a processor can access, the more accurate and nuanced the answers will be from the algorithm. Adding more memory, however, can also add cost, latency, and power.
Marvell has pioneered an architecture for custom high-bandwidth memory (HBM) solutions for AI accelerators (XPUs) and will collaborate with Samsung, Micron and SK hynix to bring tailored memory solutions to market. (See comments from Micron, Samsung, SK hynix and Marvell here in the release.)
Customizing the HBM element of XPUs can, among other benefits, increase the amount of memory inside XPUs by 33%, reduce the power consumed by the memory I/O interfaces by over 70%, and free up to 25% of silicon area to add more compute logic, depending on the XPU design1.
The shift—part of the overall trend toward custom XPUs--will have a fundamental and far-reaching impact on the performance, power consumption and design of XPUs. Invented in 2013, HBM consists of vertical stacks of high-speed DRAM sitting on a chip called the HBM base die that controls the I/O interfaces and manages the system. The base die and DRAM chips are connected by metal bumps.
Vertical stacking has effectively allowed chip designers to increase the amount of memory close to the processor for better performance. A scant few years ago, cutting-edge accelerators contained 80GB of HBM2. Next year, the high-water mark will reach 288GB.
Still, the desire for more memory will continue, putting pressure on designers to economize on space, power and cost. HBM currently can account for 25% of the available real estate inside an XPU and 40% of the total cost3. HBM4, the current cutting-edge standard, features an I/O that consists of 32 64-bit channels - an immense size that is already making some aspects of chip packaging extremely complex.
All About Optimizing XPU TCO
The Marvell custom HBM compute architecture involves optimizing the base HBM die and its interfaces, currently designed around standards from JEDEC, with solutions uniquely designed to dovetail with the design, characteristics and performance objectives of the host AI compute die.
Imagine that a hyperscaler wants an AI inference XPU for edge data centers squeezed into dense business districts or urban corridors. Cost and power consumption will be at a premium while absolute compute performance will likely be less important. A custom HBM solution might involve reducing the size of the AI compute die to economize on chip size and power above other considerations.
At the other end of the spectrum, an HBM subsystem for XPUs powering a massive AI training cluster might be tuned for capacity and high bandwidth. In this situation, the emphasis could be on reducing the size of the I/O interface. Reducing I/O size creates space for more interfaces on the so-called beachfront at the side of a chip and hence, boosting total bandwidth.