

By Raghib Hussain, President, Products and Technologies
This article was originally published in VentureBeat.
Artificial intelligence is about to face some serious growing pains.
Demand for AI services is exploding globally. Unfortunately, so is the challenge of delivering those services in an economical and sustainable manner. AI power demand is forecast to grow by 44.7% annually, a surge that will double data center power consumption to 857 terawatt hours in 20281: as a nation today, that would make data centers the sixth largest consumer of electricity, right behind Japan’s2 consumption. It’s an imbalance that threatens the “smaller, cheaper, faster” mantra that has driven every major trend in technology for the last 50 years.
It also doesn’t have to happen. Custom silicon—unique silicon optimized for specific use cases—is already demonstrating how we can continue to increase performance while cutting power even as Moore’s Law fades into history. Custom may account for 25% of AI accelerators (XPUs) by 20283 and that’s just one category of chips going custom.
The Data Infrastructure is the Computer
Jensen Huang’s vision for AI factories is apt. These coming AI data centers will churn at an unrelenting pace 24/7. And, like manufacturing facilities, their ultimate success or failure for service providers will be determined by operational excellence, the two-word phrase that rules manufacturing. Are we consuming more, or less, energy per token than our competitor? Why is mean time to failure rising? What’s the current operational equipment effectiveness (OEE)? In oil and chemicals, the end products sold to customers are indistinguishable commodities. Where they differ is in process design, leveraging distinct combinations of technologies to squeeze out marginal gains.
The same will occur in AI. Cloud operators already are engaged in differentiating their backbone facilities. Some have adopted optical switching to reduce energy and latency. Others have been more aggressive at developing their own custom CPUs. In 2010, the main difference between a million-square-foot hyperscale data center and a data center inside a regional office was size. Both were built around the same core storage devices, servers and switches. Going forward, diversity will rule, and the operators with the lowest cost, least downtime and ability to roll out new differentiating services and applications will become the favorite of businesses and consumers.
The best infrastructure, in short, will win.
The Custom Concept
And the chief way to differentiate infrastructure will be through custom infrastructure that are enabled by custom semiconductors, i.e., chips containing unique IP or features for achieving leapfrog performance for an application. It’s a spectrum ranging from AI accelerators built around distinct, singular design to a merchant chip containing additional custom IP, cores and firmware to optimize it for a particular software environment. While the focus is now primarily on higher value chips such as AI accelerators, every chip will get customized: Meta, for example, recently unveiled a custom NIC, a relatively unsung chip that connects servers to networks, to reduce the impact of downtime.
By Ravindranath C Kanakarajan, Senior Principal Engineer, Switch BU
Marvell has been actively involved with SONiC since its beginning, with many SONiC switches powered by Marvell® ASICs at hyperscalers deployed worldwide. One of Marvell's goal has been to enhance SONiC to address common issues and optimize its performance for large-scale deployments.
The Challenge
Many hackathon projects have focused on improving the monitoring, troubleshooting, debuggability, and testing of SONiC. However, we believe one of the core roles of a network operating system (NOS) is to optimize the use of the hardware data plane (i.e., the NPUs and networking ASICs). As workloads become increasingly more demanding, it becomes crucial to maximize the efficiency of the data plane. Commercial black-box NOS are tailored to specific NPUs/ASICs to achieve optimal performance. SONiC, however, supports a diverse range of NPUs/ASICs, presenting a unique challenge.
We at Marvell have been contributing features to SONiC to ensure optimal use of the underlying networking ASIC resources. Over time, we’ve recognized the need to provide operators with flexibility in utilizing ASIC resources while reducing the platform-specific complexity gradually being introduced into SONiC’s core component, the Orchagent. This approach will help SONiC operators to maintain consistent device configurations even when using devices from different platform vendors.
BYOC
During the Hackathon, we developed a framework called “BYOC: Bring Your Own Configuration,” allowing networking ASIC vendors to expose their hardware capabilities in a file describing intent. A new agent transforms the user’s configuration into an optimal SONiC configuration based on the capabilities file. This approach allows ASIC vendors to ensure that user configurations are converted to optimal ASIC configurations. It also allows SONiC operators to fine-tune the hardware resources consumed based on the deployment needs. It further helps in optimally migrating configurations from vendor NOS to SONiC based on the SONiC platform’s capability.
By Michael Kanellos, Head of Influencer Relations, Marvell
With AI computing and cloud data centers requiring unprecedented levels of performance and power, Marvell is leading the way with transformative optical interconnect solutions for accelerated infrastructure to meet the rising demand for network bandwidth.
At the ECOC 2024 Exhibition Industry Awards event, Marvell received the Most Innovative Pluggable Transceiver/Co-Packaged Module Award for the マーベル® COLORZ® 800 family. Launched in 2020 for ECOC’s 25th anniversary, the ECOC Exhibition Industry Awards spotlight innovation in optical communications, transport, and photonic technologies. This recognition highlights the company’s innovations in ZR/ZR+ technology for accelerated infrastructure and demonstrates its critical role in driving cloud and AI workloads.
By Michael Kanellos, Head of Influencer Relations, Marvell
Coherent optical digital signal processors (DSPs) are the long-haul truckers of the communications world. The chips are essential ingredients in the 600+ subsea Internet cables that crisscross the oceans (see map here) and the extended geographic links weaving together telecommunications networks and clouds.
One of the most critical trends for long-distancer communications has been the shift from large, rack-scale transport equipment boxes running on embedded DSPs often from the same vendor to pluggable modules based on standardized form factors running DSPs from silicon suppliers tuned to the power limits of modules.
With the advent of 800G ZR/ZR+ modules, the market arrives at another turning point. Here’s what you need to know.
It’s the Magic of Modularity
PCs, smartphones, solar panels and other technologies that experienced rapid adoption had one thing in common: general agreement on the key ingredients. By building products around select components, accepted standards and modular form factors, an ecosystem of suppliers sprouted. And for customers that meant fewer shortages, lower prices and accelerated innovation.
The same holds true of pluggable coherent modules. 100 Gbps coherent modules based on the ZR specification debuted in 2017. The modules could deliver data approximately 80 kilometers and consumed approximately 4.5 watts per 100G of data delivered. Microsoft became an early adopter and used the modules to build a mesh of metro data centers1.
Flash forward to 2020. Power per 100G dropped to 4W and distance exploded: 120k connections became possible with modules based on the ZR standard and 400k with the ZR+ standard. (An organization called OIF maintains the ZR standard. ZR+ is controlled by OpenROADM. Module makers often make both varieties. The main difference between the two is the amplifier: the DSPs, number of channels and form factors are the same.) ®
The market responded. 400ZR/ZR+ became adopted more rapidly than any other technology in optical history, according to Cignal AI principal analyst Scott Wilkinson.
“It opened the floodgates to what you could do with coherent technology if you put it in the right form factor,” he said during a recent webinar.
By Suhas Nayak, Senior Director of Solutions Marketing, Marvell
In the world of artificial intelligence (AI), where compute performance often steals the spotlight, there's an unsung hero working tirelessly behind the scenes. It's something that connects the dots and propels AI platforms to new frontiers. Welcome to the realm of optical connectivity, where data transfer becomes lightning-fast and AI's true potential is unleashed. But wait, before you dismiss the idea of optical connectivity as just another technical detail, let's pause and reflect. Think about it: every breakthrough in AI, every mind-bending innovation, is built on the shoulders of data—massive amounts of it. And to keep up with the insatiable appetite of AI workloads, we need more than just raw compute power. We need a seamless, high-speed highway that allows data to flow freely, powering AI platforms to conquer new challenges.
In this post, I’ll explain the importance of optical connectivity, particularly the role of DSP-based optical connectivity, in driving scalable AI platforms in the cloud. So, buckle up, get ready to embark on a journey where we unlock the true power of AI together.