Meta introduces four latest MTIA processors engineered for AI inference — set to debut at six-month intervals.
The chiplet-based accelerators are designed to run AI inference more efficiently than GPUs optimized for training workloads.
Receive 3DTested's top stories and detailed evaluations, delivered directly to your email.
You are now subscribed
Your newsletter sign-up was successful
Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years.,` (Wait, I'll just put the comma). * Input: `release,` * Output: `announcement,` * Final check on "industry". * Input: `industry` * Output: `sector` * Wait, "industry" is a common noun. "Sector" is a common noun. * One last look at the whole thing. * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-primary emphasis and seamless integration by constructing inherently on sector * Wait, "inference-first" is a compound adjective. "Inference-primary" is a compound adjective. * Is "inference-primary" a real term? It's understandable. "Inference-centered" might be better. * "inference-centered emphasis". * Let's use "inference-centered emphasis". * “We’ve established a winning plan for MTIA by emphasizing quick, cyclical advancement, states Meta’s official announcement, together with an inference-centered emphasis and seamless integration by constructing inherently on sector * Wait, "inference-centered" is 1 word (hyphenated). "Emphasis" is 1 word. * "inference-first" is 1 word (hyphenated). "Focus" is 1 word. * Okay, ready. Standards.
The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are slated for large-scale rollout in the beginning of 2027 and the latter part of 2027, respectively. According to Meta's technical blog, from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times.
Meta asserts that MTIA 450 provides twice the HBM bandwidth of MTIA 400, characterizing it as “much higher than that of existing leading commercial products,” or, effectively, Nvidia’s H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. In fact, it’s HBM bandwidth and not raw FLOPs that acts as the primary constraint during the decoding stage of transformer inference, while common GPUs are engineered to optimize FLOPs for extensive pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads.
| Row 0 - Cell 0 | MTIA 300 | MTIA 400 | MTIA 450 | MTIA 500 |
Workload Focus | R&R Training | General | AI Inference | AI Inference |
Module TDP | 800 W | 1,200 W | 1,400 W | 1,700 W |
HBM Bandwidth | 6.1 TB/s | 9.2 TB/s | 18.4 TB/s | 27.6 TB/s |
HBM Capacity | 216 GB | 288 GB | 288 GB | 384-512 GB |
MX4 Performance | - | 12 PFLOPS | 21 PFLOPS | 30 PLOPS |
FP8/MX8 Performance | 1.2 PFLOPS | 6 PFLOPS | 7 PFLOPS | 10 PFLOPS |
BF16 Performance | 0.6 PLOPS | 3 PFLOPS | 3.5 PFLOPS | 5 PFLOPS |
Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.
Regarding future implementation, MTIA 400, 450, and 500 will all share a common chassis, rack, and network framework, which means every subsequent processor iteration slots into the current physical space for seamless swapping. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.
The technology suite operates directly using PyTorch, VLLM, and Triton, providing compatibility for torch.compile and torch.export so that operational models can be implemented concurrently on both GPUs and MTIA without MTIA-specialized rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.
This situation arises only two weeks after Meta revealed a multi-year, $100 billion AI infrastructure agreement with AMD, indicating that a more extensive strategy is in motion to decrease reliance on Nvidia throughout different segments of Meta’s AI infrastructure while maintaining MTIA Central to inference processing.
Follow 3DTested on Google News, or add us as a preferred source, to obtain our newest reports, breakdowns, & appraisals via your feeds.
Receive 3DTested's top stories and detailed evaluations, delivered directly to your email.
