Tech titans team up to form optical interconnect alliance to solve the AI buildout's big data bottleneck — Nvidia, AMD, Broadcom & more set sights on building PHY to break through the limitations of copper
OCI MSA collaboration seeks to break past the copper wall
Get 3DTested's best news and in-depth reviews, straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
This week, AMD, Broadcom, Nvidia, OpenAI, Meta, and Microsoft announced plans to standardize a protocol-agnostic, scale-up interconnect for AI data centers. The Optical Compute Interconnect Multi-Source Agreement (OCI MSA) group is tasked with defining an open connectivity specification for optical interconnections in AI data centers. This would allow for higher domain scale-up sizes, and enable a multi-vendor supply chain for optical interconnects, which the ongoing AI infrastructure buildout demands.
The group's primary goal is to enable data centers to scale by using optical interconnections rather than relying solely on copper, which is currently hitting its physical limits for optimal data transfer speeds and power consumption. Copper is also facing significant supply chain constraints, and an industry-wide shift to optical interconnections would alleviate some of this demand. An optical interconnection would also bolster data transfer speeds, crucial for large-scale AI workloads.
For copper, pushing electrical signals to high speeds results in signal degradation and unsustainable levels of power consumption; the solution is optical interconnects. Copper is inherently a lossy, resistant medium, necessitating huge amounts of power to send data over distances at high speeds. An optical physical layer (PHY) can overcome this electrical resistance challenge, allowing for higher-speed data transmission. The goal of the newly-founded group is to develop a PHY capable of delivering up to 3.2Tb/s and beyond.
Article continues belowGiven that power is an ongoing challenge in AI data center buildouts, a solution that not only stabilizes power usage but also increases interconnection speeds seems like an obvious choice.
Optical cables would let more systems be concurrently connected over greater distances, without many of the penalties that come with copper, improving scale-up domains. However, optics also comes with its own downsides: Failure rates, increased heat output, higher overall costs, and overall failure rates. With the technology nascent in its application, new standards must be created in order for it to mature.
Going platform-agnostic
"Optical cables and the silicon photonics technology already exist when it comes to connecting different switches as part of a pluggable transfer ecosystem.” Said Vivek Raghunathan, CEO and co-founder at Xscape Photonics, in a recent interview with 3DTested Premium. Indeed, TSMC's COUPE technology is a foundational bedrock for enabling optical and photonics in chips; where the new OCI MSA standard comes in is to enable these physical chips to effectively travel across the same lanes.
The open standard developed by the OCI MSA would allow multiple vendors within the optical supply chain to offer components to a singlular, unified spec. In theory, it should drive down the cost of optics and optical interconnects at scale. Moreover, it divests the sole reliance on TSMC, as the standard would ensure interoperability between products and chips made with COUPE, and those using alternative CPO packaging platforms, effectively de-risking optical supply chains in the process.
If co-packaged optics were introduced into data centers without an effective rulebook or guidelines to govern them, each vendor design would become proprietary, siloed between one another. So for larger-scale data centers that run multiple ecosystems, the OCI MSA's new spec defines a standard to let them drive on the same roads and use the same data pathways. This isn't about hooking together systems and getting them to understand each other via NVLink or UALink, but instead about offering a physical foundation to allow those protocols to travel over fibre connections.
Within a data center, the OCI MSA's newly defined standard would allow a data center to theoretically run NVLink for Nvidia chips and UALink for AMD hardware, while making use of the same underlying optical infrastructure, at higher speeds than copper can realistically allow.
“Fundamentally, the current copper-based interconnects just cannot meet that bandwidth requirement," Raghunathan continued. The OCI MSA's standard supports a range of optical solutions, such as pluggable optical modules, on-board optics, and chips using co-packaged optics. This would effectively break the barriers that copper currently employs.
Getting in on the ground
The companies involved in the newly formed alliance are all of the usual suspects: Nvidia, AMD, Microsoft, OpenAI, Broadcom, and Meta all create their own AI accelerators in some way, shape or form. Letting them assist in defining the new PHY means their products will be supported by it from day one.
"Scale up focused optical technologies, protocols, and switch architectures are foundational to building scalable, multi rack, high performance AI compute domains. The OCI MSA advances this vision with a forward-looking physical layer specification setting the stage for open standards, differentiated implementations and systems architecture innovation.” Said Saurabh Dighe, Corporate VP of Azure and Architecture at Microsoft.
From an early deployment at 200Gbps speeds all the way up to the 3.2Tb/s outlined by the alliance, it's becoming increasingly clear that to serve larger frontier models, or build toward vastly more complex AI data centers, optics is no longer an exploratory zone. It must be rolled out and served directly, if hyperscalers and AI model developers have their way.
"By equipping best-in-class compute with state-of-the-art optics, the OCI MSA can deliver the scale and performance required by the next era of super-intelligence." Says Gilad Shainer, SVP of networking at Nvidia. With superintelligence on the mind, speed is a non-negotiable, and the standards being drawn up reflect the rapid advancement of data center technology.
