At GTC 2026, Nvidia debuted

Nvidia BlueField-4 STX — (Image credit: Nvidia)

Nvidia announced BlueField-4 STX during GTC 2026 on March 16, a customizable blueprint for enhanced storage intended to tackle the data throughput constraints hindering agentic AI inference.

Centered on a modern storage-enhanced BlueField-4 DPU and ConnectX-9 SuperNIC, the architecture focuses on GPU inefficiency that arises when AI agents functioning over lengthy sessions and broadening context windows surpass the throughput of conventional storage paths. Nvidia states that STX provides as much as five times the token throughput, four times greater power efficiency, and twice the page ingestion rate relative to conventional CPU-based storage architectures.

The specific issue that Nvidia is targeting with STX is KV cache management. During transformer inference, the attention mechanism computes KV pairs for every token in context, which must be stored and retrieved for each subsequent generation step. But these context windows are growing into the hundreds of thousands of tokens, meaning that the KV cache is outgrowing GPU HBM capacity. The standard alternative involves transferring to host DRAM or NVMe storage, yet both pathways traverse the CPU, introducing delays that increase alongside context size and halt GPU processing during data movement.

Article continues below

STX circumvents the host CPU by directing traffic through a specific accelerated storage level using RDMA over Spectrum-X Ethernet. BlueField-4 manages NVMe SSDs directly and handles data integrity and encryption for the KV cache, keeping context accessible at the storage processor rather than transiting the host. The entire system functions on the Vera Rubin platform and incorporates the Vera CPU — also announced at GTC on March 16 — in addition to ConnectX-9, Spectrum-X Ethernet, DOCA software, and AI Enterprise software. The first rack-scale implementation built on STX is the Nvidia CMX context memory storage platform.

Storage and infrastructure vendors co-designing systems based on STX include DDN, Dell Technologies, HPE, IBM, NetApp, and VAST Data, alongside manufacturing partners AIC, Supermicro, and Quanta Cloud Technology. Meanwhile, eight cloud and AI providers — including CoreWeave, Lambda, Mistral AI, and Oracle Cloud Infrastructure — committed to early adoption for context memory storage. STX-reliant architectures are anticipated from collaborators during the latter part of 2026.

"Agentic AI is redefining what software can do — and the computing infrastructure behind it must be reinvented to keep pace," Jensen Huang, founder and CEO of Nvidia, said at GTC. "AI systems that reason across massive context and continuously learn require a new class of storage."

Follow 3DTested on Google News, or add us as a preferred source, to obtain our newest reports, breakdowns, & appraisals via your feeds.

TOPICS