Meta introduces four latest MTIA processors engineered for AI inference — set to debut at six-month intervals.

Meta MTIA
(Image credit: Meta)

Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years.,` (Wait, I'll just put the comma). * Input: `release,` * Output: `announcement,` * Final check on "industry". * Input: `industry` * Output: `sector` * Wait, "industry" is a common noun. "Sector" is a common noun. * One last look at the whole thing. * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-primary emphasis and seamless integration by constructing inherently on sector * Wait, "inference-first" is a compound adjective. "Inference-primary" is a compound adjective. * Is "inference-primary" a real term? It's understandable. "Inference-centered" might be better. * "inference-centered emphasis". * Let's use "inference-centered emphasis". * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-centered emphasis and seamless integration by constructing inherently on sector * Wait, "inference-centered" is 1 word (hyphenated). "Emphasis" is 1 word. * "inference-first" is 1 word (hyphenated). "Focus" is 1 word. * Okay, ready. Standards.

Swipe to scroll horizontally
Row 0 - Cell 0

MTIA 300

MTIA 400

MTIA 450

MTIA 500

Workload Focus

R&R Training

General

AI Inference

AI Inference

Module TDP

800 W

1,200 W

1,400 W

1,700 W

HBM Bandwidth

6.1 TB/s

9.2 TB/s

18.4 TB/s

27.6 TB/s

HBM Capacity

216 GB

288 GB

288 GB

384-512 GB

MX4 Performance

-

12 PFLOPS

21 PFLOPS

30 PLOPS

FP8/MX8 Performance

1.2 PFLOPS

6 PFLOPS

7 PFLOPS

10 PFLOPS

BF16 Performance

0.6 PLOPS

3 PFLOPS

3.5 PFLOPS

5 PFLOPS

Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.

Regarding future implementation, MTIA 400, 450, and 500 will all share a common chassis, rack, and network framework, which means every subsequent processor iteration slots into the current physical space for seamless swapping. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.

The technology suite operates directly using PyTorch, VLLM, and Triton, providing compatibility for torch.compile and torch.export so that operational models can be implemented concurrently on both GPUs and MTIA without MTIA-specialized rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.

This situation arises only two weeks after Meta revealed a multi-year, $100 billion AI infrastructure agreement with AMD, indicating that a more extensive strategy is in motion to decrease reliance on Nvidia throughout different segments of Meta’s AI infrastructure while maintaining MTIA Central to inference processing.

Article continues below

Google Preferred Source

Follow 3DTested on Google News, or add us as a preferred source, to obtain our newest reports, breakdowns, & appraisals via your feeds.

Luke James
Contributor