Hypothesis: 💡
When IOMMU is enabled, GPU DMA memory transfers pass through an address translation layer.
Those translations are cached in the I/O TLB.
At very high frame rates (~300 FPS) the GPU may generate a huge number of memory transactions.
If the I/O TLB starts missing, the extra lookups could add latency to memory operations and reduce effective throughput.
Run the same benchmark with the GPU forced to PCIe Gen 4 instead of Gen 5.
If this theory is correct, I’d still expect IOMMU Disabled to be faster.
But if PCIe bandwidth is amplifying the translation overhead, the performance gap should be smaller at Gen 4 than Gen 5.