Emulator Issues #8161
Hardware Bounding Box gets exponentially more demanding as IR increases
Paper Mario: Thousand Year Door
What's the problem? Describe what went wrong in few words.
The game runs nearly flawlessly at all times, except when Mario turns sideways by pressing [R], or when he curls up to go down a pipe, or under other circumstances that transform the Mario Sprite, at which point the FPS falls down immediately to 30-40FPS. Only happens on the OpenGL Backend.
What did you expect to happen instead?
The FPS /not/ to drop to 30-ish.
What steps will reproduce the problem?
[Don't assume we have ever played the game and know any level names. Be as
specific as possible.]
1. At any time after Mario gains the ability to pass between pipes by turning sideways, press and hold down [R]. The FPS will immediately halve itself.
Dolphin 3.5 and 3.5-367 are old versions of Dolphin that have
known issues and bugs, so don't report issues about them and test the
latest Dolphin version first.
Which versions of Dolphin did you test on?
Does using an older version of Dolphin solve your issue? If yes, which
versions of Dolphin used to work?
4.0-4217 works flawlessly.
It would appear 4.0-4219 (OGL: implement bounding box support with ssbo (PR #1550 from degasus)) introduced this bug.
What are your PC specifications? (including, but not limited to: Operating
System, CPU and GPU)
Windows 7 64x
Intel i7-4770 CPU @ 3.40Ghz
NVidia Geforce GTX980 (I also noticed this issue on my previous setup which was two GTX760's SLI'd together)
Driver Version 347.25
Is there any other relevant information? (e.g. logs, screenshots,
#2 Updated by Sonicadvance1 over 5 years ago
- Status changed from New to Questionable
Yes, it's using bounding boxes which require atomics on the GPU.
The higher your internal resolution, the more of an impact this will have on you.
Should probably also make sure your GPU is running at its highest clock speed when running Dolphin as well during these to make sure it completes as quickly as possible.
#3 Updated by england-reece over 5 years ago
These are the results of the testing I just did: http://i.imgur.com/ls8BWa0.png
During the peaks of the GPU, when it's at its max state, the FPS is /exactly/ 30FPS. I think that's an interesting data point. Adding on to that, it is indeed at its highest clock frequency during that state (about 1290Mhz).
During these periods, the Title bar reports an FPS of exactly 30, and a VPS also exactly equal to 30. When I release the trigger and when Mario finishes returning to normal, the sound returns to normal and the FPS and VPS immediately drive back up to 60 and the game plays normally again.
Let me know any other information I could gather.
#5 Updated by england-reece over 5 years ago
Another test with IR set to 1x: http://i.imgur.com/bwId95C.png
The effect is less profound (FPS drops to 44, VPS drops to 53) but it's still noticeable, and as you can see from the GPU log, it's still pushing the GPU to a very high level. You can see the dip in the middle when I released [R] just as a control.
#7 Updated by england-reece over 5 years ago
150 FPS? Uhh, /how/? I only get >60FPS if I turn off the frame limit (which also causes the game to run at 350FPS/VPS at about 5-6x speed).
I also got the same problem with an uncapped frame limit. ~250FPS/350VPS when not holding down [R], ~33FPS and ~57VPS when holding down [R].
JMC, are you using the OpenGL Backend or the Direct3D backend? Because the Direct3D backend doesn't exhibit these problems, it runs a nearly perfectly smooth 60FPS/VPS at all times.
#8 Updated by JMC4789 over 5 years ago
I almost always use OpenGL; I was using OpenGL during this test. I'm one of the people who did performance testing on the Hardware Bounding Box stuff; and my GTX 760 was the primary card I tested on, along with a Radeon HD 5850 and a Radeon r9 285.
The HD 5850 was very slow, but it was an old card and to be expected, both the GTX 760 and R9 285 performed well over 60 fps up to 4x IR when I tested Paper Mario's bounding box.
#9 Updated by england-reece over 5 years ago
So there's gotta be something with the 980 that doesn't like the algorithm that's being used. Or possibly something with my configuration: you keep saying your FPS values are much greater than 60, yet mine never /ever/ go above 60 (unless I disable the cpu frame limit) and for the record, I have vsync turned off-it still caps at 60. So what enables you to get such high FPS values? That might be a place to start with.
#11 Updated by england-reece over 5 years ago
So what do you make of the absurdly high GPU usage in my graphs associated with the bounding box code: is there a better profiling tool I could use that would give better data? Are there any tests I could perform that would help the dev team?
#13 Updated by england-reece over 5 years ago
Alright, these are the tests I performed, all at 1xIR:
VSync Enabled, Normal movement: http://i.imgur.com/Jlm5qZr.png
VSync Enabled, Rotated movement: http://i.imgur.com/JWq42pB.png
VSync Disabled, Normal movement: http://i.imgur.com/yJ5BoEG.png
VSync Disabled, Rotated movement: http://i.imgur.com/uyDHRUs.png
As you can see, VSync doesn't seem to make a difference.
For the record, my original graphs were with VSync disabled.
#20 Updated by degasus about 5 years ago
The high GPU load with the hardware bbox code is well knows. Older (especially AMD) GPUs are known to perform very bad with atomics.
As only some games are known to use bbox at all, we've merged a new option to force disable bbox. So almost all (but this rare game) should run fine again.
#22 Updated by england-reece over 4 years ago
Update: I'm using the same computer in Windows 10, with Dolphin 4.0-8231 and am still experiencing this issue with OpenGL only.
I did some digging through the source code, and it looks like the GLSL shader code is generated by post-translating HLSL code into GLSL code. Is it possible that the way Atomics are being used in GLSL is less optimized than intended?
My other theory is that the Maxwell cards may be misreporting their support for Atomics (or rather, misreporting that they don't have support for Atomics). If so, would there be a way to know that?
#23 Updated by phire over 4 years ago
You know, in my ubershaders branch, I've run into the problem that DirectX on my GTX 960 is about 2-3x faster than OpenGL.
The OpenGL backend does 3x IR at full speed, while the DirectX backend can easily do 5x IR
And I have no idea why, it's running almost identical code on both backends (the DirectX code is theoretically slightly worse) it doesn't use atomics (yet)
#24 Updated by seapancake over 4 years ago
I can replicate this on my Geforce GTX 970 (359.09) consistently in OpenGL and can confirm issue doesn't happen in D3D. When entering/exiting the pipe the fps drops to about ~30fps and the pipe sound stutters
Testing done at 1X IR, Windows 10,i7-6700K, 16GB RAM
Save state for testing: https://mega.nz/#!IRck2bLZ
#26 Updated by england-reece over 4 years ago
Higher IR is going to be slower because there is more data to process.
The issue has nothing to do with IR, it has to do with Atomics. Someone changed the name of the issue a while back, and I'm too green to confidently contest their change.
#28 Updated by JMC4789 about 4 years ago
- Status changed from Questionable to Won't fix
D3D12 quells my need for this issue to be open. It's possible the same optimization can work on OpenGL as well.
The original issue that it gets more demanding at higher IRs is invalid. The speed loss is wontfix/fixed. So I just kind of put this to inbetween.