The great Bench GPU retest begins — how we're testing for…

A group of RTX 50-series and RX 9000-series GPU boxes — (Image credit: Future)

As we prepare to embark on a new round of testing for our GPU Hierarchy, we want to give 3DTested Premium subscribers a deep dive into our thinking and methods as results from this testing begin to feed into our Bench database, as well as a test plan that will show you what data to expect and when. This article will help you interpret our game testing results and understand why we test the way we do.

Our task for the first half of this year has sadly been made easier by the fact that neither Nvidia nor AMD nor Intel introduced new discrete gaming graphics cards at CES 2026. Historically, we would have expected an RTX 50 Super-series mid-cycle refresh from Nvidia at the very least, but the insatiable maw of AI demand has apparently dashed any launch plans for new consumer GPUs in favor of data center AI accelerators with incomparably higher margins.

Upscaling and framegen matter more than ever, but we’re leaving them out

The biggest question we had to wrestle with when devising our 2026 test plan was whether to include upscaling in the GPU Hierarchy by default. Upscalers are no longer a crutch that trades visual fidelity for a large performance boost, as they once were. Especially with the advent of Nvidia’s DLSS version 4.5 release, we are closer than ever to one of the few unconditional wins of the AI era: free performance, lower fixed resource usage, and better-than-native image quality.

For all that, we’ve still decided against enabling DLSS, FSR, and XeSS for our testing. We’re trying to exclude as many variables as possible (like CPU scaling) from what is meant to be a direct performance comparison between graphics cards. Not every upscaler produces the same output image quality, not every game implements every upscaler from every vendor, and not every card can run the same upscaling models.

Even as DLSS 4.5 generates impeccable output frames, AMD’s FSR 4 can’t match its image quality, and FSR 4 only officially runs on certain Radeons. Older cards can only take advantage of FSR 3.x and earlier, which are cross-compatible across graphics cards from any vendor but don’t benefit from AI-enhanced architectures. Intel’s XeSS uses AI models of various fidelity in both its Arc-friendly and cross-vendor approaches, but its image quality also isn’t on par with DLSS, and it’s not in every game.

With all that in mind, even if we test Nvidia, AMD, and Intel graphics cards at the same input resolution before upscaling, we’re getting “Nvidia frames,” “AMD frames,” and “Intel frames” out the other end, which adds a layer of undesirable complexity to our analysis.

We want the GPU Hierarchy and Bench to be as clean and simple a representation of comparative performance between graphics cards as possible, so we’re excluding the variables introduced by upscaling from our data.

We are living in a new world for competitive analysis versus years past, though. Just because a graphics card produces a low frame rate in our hierarchy, that no longer means that it’s irredeemably slow and that upgrading to a newer card is the only way to get around it. In the upscaling era, it might be possible to enable DLSS, FSR, or XeSS and boost a card’s performance to a playable level with minimal or even positive impacts on image quality.

That said, if a card has an extremely low baseline frame rate in the GPU Hierarchy, upscaling isn’t going to magically transform it into a speed demon. Doubling or tripling a low frame rate can still result in only a borderline level of performance on the other end. Really old or really slow cards might not even have enough spare compute resources to run an upscaler in addition to the basic render loop at all.

Frame generation is the other modern marvel of gaming performance, but we’re also excluding it from our hierarchy data. Unlike with upscaling, turning on framegen has real costs. It usually introduces a large input latency penalty, and if that penalty is large enough to exceed an acceptable threshold, it has to be compensated for elsewhere, whether through changing upscaling or quality settings, and that in turn can compromise image quality.

In short, just because a card is producing a large number of output frames with framegen enabled, it doesn’t mean it’s providing a playable or enjoyable experience. We view frame generation as a cherry on top of an already solid gaming experience, not a fundamental method of achieving good baseline performance, and so it has no place in our hierarchy testing.

Our benchmarking approach: eyes on monitor, hands on mouse and keyboard

With limited exceptions, we rely on our own custom benchmark sequences captured directly from gameplay using Nvidia’s FrameView utility rather than scripted benchmarks. Sitting back and watching a non-interactive, disembodied camera float through a scene at a fixed rate of motion might be perfectly repeatable, but that doesn’t capture how it “feels” to play a given game on a given graphics card and system. That’s a function of low input latency and smooth frame delivery. To meaningfully comment on those matters requires trained eyes on a monitor and hands on the mouse and keyboard, full stop.

Furthermore, a scripted benchmark might not even be representative of performance in a title’s core gameplay activity, whether that’s running across a battlefield and shooting bad guys, driving around the Nurburgring, or scrolling across a map in a 4X title. Those activities might be more boring than a free camera swooping through a scripted battle, but if that’s what the player is going to experience directly, that’s what we want to measure.

Limiting ourselves to games with built-in benchmarks also ties our hands in the event that a major title doesn’t have one. We don’t want to let that stand in the way of commenting on performance from a hit or influential title.

This is by far the most time- and labor-intensive way to benchmark gaming performance, but it means you can trust that all of the output of our cards under test has been evaluated by expert human eyes, not just generated blindly from an automated run, transferred from a log file into a spreadsheet, and regurgitated without further inquiry. When we say a graphics card is fast, smooth, and responsive, we know it and mean it.

We choose in-game benchmark sequences of about 60 seconds in length based on our years of experience as members of the media and as part of game testing labs at large GPU companies. We want a scene to show as many elements of a game’s eye candy as possible, from shadows and reflections to complex geometry to objects and terrain near and far. Blank walls occupying the entire viewport need not apply.

We try to spend enough time playing each game we choose to test to understand what constitutes a light, average, and demanding scene for performance, and choose scenes that are representative of the key experience a player is likely to see, rather than a worst-case scenario that might only represent a small portion of a game’s playtime.

In the event we find a performance or rendering issue with a popular game on certain hardware, we can also hold GPU vendors’ feet to the fire to make sure that it’s flagged and fixed. This used to be a rare occurrence, but as more and more corporate resources get dedicated to AI accelerators and software development at GPU vendors that might have formerly been dedicated to gaming drivers and QA, we want to keep an eagle eye out.

Picking the lineup

Choosing the games that make up the overall performance picture for our hierarchy involves a lot of trade-offs. We’d love to test every single game on the market on every graphics card that still works with modern PCs, but we only have so much time.

First and foremost, we want to make sure that we’re testing titles that gamers are actually playing right now and would likely motivate a purchase or upgrade.

To guide our title choices, we first turn to publicly available statistics like Steam Charts to see which games have the largest player bases and which ones are sustaining their popularity over time. We also consider the general buzz from the games press and gaming community.

If a game is a technical tour-de-force that helps us exercise particular architectural features or resources of a graphics card, whether that’s from a particularly demanding ray-tracing implementation or a hunger for VRAM, we might include it despite its relative popularity, but we try not to let those editors’ picks dominate our lineup.

Most of today’s games are built atop engines that support DirectX 12. A handful of popular titles still rely on DirectX 11 and Vulkan, but we don’t go out of our way to include a disproportionate number of those titles compared to how frequently game studios choose to target those APIs with their projects.

Similarly, more and more of today’s biggest games are built on Unreal Engine 5, but as long as player stats suggest it makes sense to do so, we try to include a diverse set of engines to see whether certain GPU architectures handle the demands of one engine better than another. Overall, Unreal Engine 5 games make up a little less than half of our test suite, and we feel like that’s a fair mix given the current state of the market.

We’re continuing to split our performance results between raster-only tests and those with RT enabled. The bulk of our data will continue to come from those raster-only tests, but we’ve already gotten a glimpse at some 2026 releases, such as Pragmata, that deploy RT to gorgeous effect, and we’ll likely rotate out some older RT titles and include new ones as the year progresses.

Our first-half 2026 results for the GPU Hierarchy will include data from the following raster games, at a minimum:

Swipe to scroll horizontally

Title	Engine	Graphics API	Why it's here
Counter-Strike 2	Source	DX11	One of the world's most popular PC games, period
Apex Legends	Proprietary	DX11	Another wildly popular esports title
Fortnite	Unreal Engine 5	DX12	A freemium cultural phenomenon
Marvel Rivals	Unreal Engine 5	DX12	Another popular freemium title
ARC Raiders	Unreal Engine 5	DX12	A breakout hit game with a huge and loyal player base
Alan Wake II	Northlight	DX12	A visual feast that eats graphics cards alive
Black Myth Wukong	Unreal Engine 5	DX12	A beauty of a game that pushes hardware to the limits
Marvel's Spider-Man 2	Proprietary	DX12	A demanding PlayStation port with impressive RT effects on tap
Stalker 2	Unreal Engine 5	DX12	A PC-crushing walk through the Chernobyl Exclusion Zone
Cyberpunk 2077	REDEngine	DX12	One of the biggest PC games of all time and a technical proving ground
Clair Obscur Expedition 33	Unreal Engine 5	DX12	One of the most acclaimed games of all time
Microsoft Flight Simulator 2024	Proprietary	DX12	Beautiful visuals powered by an extraordinarily demanding engine
Assassin's Creed Shadows	Ubisoft Anvil	DX12	The latest in a long line of breathtaking open-world adventures

In addition, we will include the following games in our tests of ray-traced game performance at a minimum:

Swipe to scroll horizontally

Title	Engine	Graphics API	Why it's here
Grand Theft Auto V Enhanced	Proprietary	DX12	An all-time classic with a fresh coat of RT-enhanced eye candy
Doom: The Dark Ages	Row 1 - Cell 1	Row 1 - Cell 2	A thoroughly modern title that requires RT to run at all
Indiana Jones and the Great Circle	Row 2 - Cell 1	Row 2 - Cell 2	Another modern title that requires an RT-capable GPU
Cyberpunk 2077	REDengine	DX12	Row 3 - Cell 3
Marvel's Spider-Man 2	Proprietary	DX12	Row 4 - Cell 3
Black Myth Wukong	Unreal Engine 5	DX12	Row 5 - Cell 3
Alan Wake II	Northlight	DX12	Row 6 - Cell 3
Assassin's Creed Shadows	Ubisoft Anvil	DX12	Row 7 - Cell 3

Our test setup: still AMD, still X3D

Our test PC for 2026 continues to use AMD’s Ryzen 7 9800X3D CPU paired with 32GB of DDR5-6000 RAM as a foundation. This setup is widely considered to be the sweet spot for modern gaming performance, and the recent release of the slightly warmed-over Ryzen 7 9850X3D does little to change that.

Swipe to scroll horizontally

Header Cell - Column 0	3DTested 2026 GPU Test Bench
CPU	AMD Ryzen 7 9800X3D
RAM	G.Skill Trident Z5 32GB (2x16GB) DDR5-6000
Motherboard	Asus TUF Gaming X670E-Plus Wifi
SSD	Inland Performance Plus 4TB PCIe 4.0 NVMe SSD
CPU heatsink	Thermalright Phantom Spirit 120 SE
Power supply	MSI MPG Ai1600TS

AMD’s 3D V-Cache parts continue to smoke any competing CPU for gaming, and as a means of reducing CPU bottlenecks to the greatest extent possible, the 9800X3D will continue to be our platform of choice until a demonstrably superior alternative arrives.

Since our last round of GPU reviews and hierarchy testing, we’ve upgraded our system’s power supply to MSI’s MPG Ai1600TS 1600W.

MSI GeForce RTX 5090 Lightning Z — (Image credit: 3DTested)

This 80 Plus Titanium unit offers a fully digital topology, two 12V-2x6 connectors, and enough output to comfortably power both our test system and any graphics card attached to it, up to and including MSI’s own RTX 5090 Lightning Z with its up-to-1000W TGP.

The MPG Ai1600TS can also measure per-pin current on each of its 12V-2x6 connectors and report those amperages to monitoring software to warn of any possible imbalances that could threaten a power connector meltdown. It’s a beast of a PSU that’s up to the job of powering practically any consumer graphics card we can hook up to it.

As a bonus, thanks to its massive capacity, this PSU frequently operates in fanless mode even under gaming loads, meaning that its fan won’t contaminate our graphics card noise measurements when we do need to take them.

On the operating system and software side, we test games using a frozen version of Windows 11 and standardize on one graphics driver version from each vendor to ensure as little variance as possible over the course of our test plan, which spans over a month.

We can’t stop games from automatically updating during that time, but not every update causes major changes in performance. In the event a title does receive an update, we’ll spot-check our existing results to make sure that subsequent data isn’t drastically different from earlier runs. We’ll conduct retesting as necessary if we see major performance changes between updates that would materially affect our conclusions.

In addition to raw performance, we measure per-game power consumption using Nvidia’s Power Capture and Analysis Tool (PCAT). The PCAT monitors PCI Express six- or eight-pin and 12V-2x6 power connector current, as well as PCI Express slot power, and integrates with Nvidia’s FrameView performance measurement utility to provide fine-grained power usage and efficiency information alongside each captured frame’s worth of data.

The PCAT is important because power usage differs across games, and even within a game, the settings used can affect its power consumption drastically (with ray tracing or path tracing on versus off, for just one example).

Directly measuring power usage with the PCAT lets us present real-world usage and efficiency results, not just a vendor’s worst-case board power rating.

We don’t usually perform noise or thermal camera analysis as part of our hierarchy testing, but we are fully equipped to perform both of these tests when necessary.

Our test plan

For 2026, our GPU Hierarchy will initially include the current generation of products from each vendor, followed by two preceding generations. We’ll be testing these cards in waves, and data will be made available in the sequence outlined in the table below. We’ll update each card with a ✅ emoji in the table below as it’s added to Bench.

Swipe to scroll horizontally

Wave 1	Wave 2	Wave 3
Nvidia	Nvidia	Nvidia
RTX 5090	RTX 4090	RTX 3090 Ti
RTX 5080	RTX 4080 Super	RTX 3090
RTX 5070 Ti	RTX 4080	RTX 3080 Ti
RTX 5070	RTX 4070 Ti Super	RTX 3080
RTX 5060 Ti 16GB	RTX 4070 Ti	RTX 3070 Ti
RTX 5060 Ti 8GB	RTX 4070 Super	RTX 3070
RTX 5060	RTX 4070	RTX 3060 Ti
RTX 5050	RTX 4060 Ti 16GB	RTX 3060 12GB
Row 9 - Cell 0	RTX 4060 Ti 8GB	RTX 3050
AMD	RTX 4060	Row 10 - Cell 2
RX 9070 XT	Row 11 - Cell 1	AMD
RX 9070	AMD	RX 6950 XT
RX 9060 XT 16GB	RX 7900 XTX	RX 6900 XT
RX 9060	RX 7900 XT	RX 6800 XT
Row 15 - Cell 0	RX 7800 XT	RX 6800
Intel	RX 7700 XT	RX 6750 XT
Arc B580	RX 7600 XT	RX 6700 XT
Arc B570	RX 7600	RX 6650 XT
Row 19 - Cell 0	Row 19 - Cell 1	RX 6600 XT
Row 20 - Cell 0	Intel	RX 6600
Row 21 - Cell 0	Arc A770 16GB	Row 21 - Cell 2
Row 22 - Cell 0	Arc A770 8GB	Row 22 - Cell 2
Row 23 - Cell 0	Arc A750	Row 23 - Cell 2
Row 24 - Cell 0	Arc A580	Row 24 - Cell 2
Row 25 - Cell 0	Arc A380	Row 25 - Cell 2

With only a couple of exceptions, we’re using “reference” versions of these cards that hew closely to manufacturer-specified power and frequency targets, which is in keeping with our mission to provide a baseline rather than a ceiling.

Testing hefty quad-slot versions of these cards with all of the thermal headroom that board partners can throw at them is a fun post-launch activity, but it’s not what we want for establishing a performance baseline.

What's next

With all that out of the way, it’s time to get down to testing. All told, the waves of graphics cards above represent over a month of dedicated benchmarking time and untold thousands of data points to be collected. As we collect that data, it’ll begin appearing in Bench for your reference. Keep checking back on the table above to monitor our progress, and enjoy the results in Bench as part of your T om’s Hardware Premium subscription. And do let us know if you have any questions or comments regarding our testing methods—we’ll be happy to answer them.

TOPICS