Meta unveils four newest MTIA chips designed for AI…

Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years.,` (Wait, I'll just put the comma). * Input: `release,` * Output: `announcement,` * Final check on "industry". * Input: `industry` * Output: `sector` * Wait, "industry" is a common noun. "Sector" is a common noun. * One last look at the whole thing. * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-primary emphasis and seamless integration by constructing inherently on sector * Wait, "inference-first" is a compound adjective. "Inference-primary" is a compound adjective. * Is "inference-primary" a real term? It's understandable. "Inference-centered" might be better. * "inference-centered emphasis". * Let's use "inference-centered emphasis". * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-centered emphasis and seamless integration by constructing inherently on sector * Wait, "inference-centered" is 1 word (hyphenated). "Emphasis" is 1 word. * "inference-first" is 1 word (hyphenated). "Focus" is 1 word. * Okay, ready. Standards.

Go deeper with TH Premium: Chipmaking

The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are slated for large-scale rollout in the beginning of 2027 and the latter part of 2027, respectively. According to Meta's technical blog, from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times.

Meta asserts that MTIA 450 provides twice the HBM bandwidth of MTIA 400, characterizing it as “much higher than that of existing leading commercial products,” or, effectively, Nvidia’s H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. In fact, it’s HBM bandwidth and not raw FLOPs that acts as the primary constraint during the decoding stage of transformer inference, while common GPUs are engineered to optimize FLOPs for extensive pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads.

Swipe to scroll horizontally

Row 0 - Cell 0	MTIA 300	MTIA 400	MTIA 450	MTIA 500
Workload Focus	R&R Training	General	AI Inference	AI Inference
Module TDP	800 W	1,200 W	1,400 W	1,700 W
HBM Bandwidth	6.1 TB/s	9.2 TB/s	18.4 TB/s	27.6 TB/s
HBM Capacity	216 GB	288 GB	288 GB	384-512 GB
MX4 Performance	-	12 PFLOPS	21 PFLOPS	30 PLOPS
FP8/MX8 Performance	1.2 PFLOPS	6 PFLOPS	7 PFLOPS	10 PFLOPS
BF16 Performance	0.6 PLOPS	3 PFLOPS	3.5 PFLOPS	5 PFLOPS

Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.

Regarding future implementation, MTIA 400, 450, and 500 will all share a common chassis, rack, and network framework, which means every subsequent processor iteration slots into the current physical space for seamless swapping. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.

The technology suite operates directly using PyTorch, VLLM, and Triton, providing compatibility for torch.compile and torch.export so that operational models can be implemented concurrently on both GPUs and MTIA without MTIA-specialized rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.

This situation arises only two weeks after Meta revealed a multi-year, $100 billion AI infrastructure agreement with AMD, indicating that a more extensive strategy is in motion to decrease reliance on Nvidia throughout different segments of Meta’s AI infrastructure while maintaining MTIA Central to inference processing.

Article continues below

Follow 3DTested on Google News, or add us as a preferred source, to obtain our newest reports, breakdowns, & appraisals via your feeds.