AMD's upcoming RDNA 5 GPUs might improve dual-issue…

AMD RDNA 3 GPU Architecture Deep Dive — (Image credit: AMD)

The next generation of Radeon GPUs from AMD are expected to be a significant upgrade over RDNA 4, and one of the issues Team Red seems to be tackling is dual issue execution. That's the GPU's ability to execute two instructions in the same cycle — AMD's cards have had this feature since RDNA 3, but strict pairing rules meant that compilers couldn't always take advantage of it, limiting theoretical peak performance. A new LLVM patch now suggests that AMD will be solving this on RDNA 5.

Go deeper with TH Premium: GPUs

Asus RTX 5080 Noctua Edition — (Image credit: Noctua)

Coelacanth's Dream, a Linux-focused outlet, examined the new changes and found out they reference gfx13, which is derived from gfx130, aka RDNA 5. AMD is apparently adding a new instruction format called "VOPD3" that is designed to better interface with the dual issue VALU (Vector Arithmetic Logic Unit; shader unit). It should be more lenient, making it easier for the compiler to use dual issue execution.

On a technical level, the existing system, known as VOPD, largely only worked with simpler 2-operand instructions, which made it harder for compilers to schedule compatible instruction pairs. VOPD3 will expand this to 3-operand instructions, so it would be able to support operations like fused multiply-add (FMA). In fact, V_FMA_F32 was added in this very pull request and that's how we can infer it'll be on RDNA 5.

Article continues below

This would allow dual issue execution to happen more often, leading to a potentially massive increase in FP32 throughput (in some cases). Shader units will spend less time waiting for clock cycles and instead get more work done, making each instruction more efficient. This could help in demanding scenarios, such as rendering, which means game engines will be able to able to optimize for dual issue VALU.

Reducing the number of cases where pairing fails due to restrictions is a key step to making the hardware more efficient without brute-forcing IPC uplifts through silicon. FMA instructions are also important when it comes to neural rendering, so things like upscaling and frame-gen tech can also get a boost here, even if the hardware itself is not more performant — since dual issue execution improves efficiency regardless.

You can check out the Coelacanth's Dream article linked above if you're interested in more specifics, but be warned that it's very dense. Moreover, RDNA 5 is a ways out at this point, and more consumer-facing updates like higher core counts would certainly be a more marketable trait. Still, seeing a GPU reach its advertised FP32 throughput more easily and more consistently is a big architectural win.

Follow 3DTested on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS