Emulator Issues #8161
closedHardware Bounding Box gets exponentially more demanding as IR increases
0%
Description
Game Name?
Paper Mario: Thousand Year Door
Game ID?
G8ME01
What's the problem? Describe what went wrong in few words.
The game runs nearly flawlessly at all times, except when Mario turns sideways by pressing [R], or when he curls up to go down a pipe, or under other circumstances that transform the Mario Sprite, at which point the FPS falls down immediately to 30-40FPS. Only happens on the OpenGL Backend.
What did you expect to happen instead?
The FPS /not/ to drop to 30-ish.
What steps will reproduce the problem?
[Don't assume we have ever played the game and know any level names. Be as
specific as possible.]
- At any time after Mario gains the ability to pass between pipes by turning sideways, press and hold down [R]. The FPS will immediately halve itself.
Dolphin 3.5 and 3.5-367 are old versions of Dolphin that have
known issues and bugs, so don't report issues about them and test the
latest Dolphin version first.
Which versions of Dolphin did you test on?
4.0.5328
Does using an older version of Dolphin solve your issue? If yes, which
versions of Dolphin used to work?
4.0-4217 works flawlessly.
It would appear 4.0-4219 (OGL: implement bounding box support with ssbo (PR #1550 from degasus)) introduced this bug.
What are your PC specifications? (including, but not limited to: Operating
System, CPU and GPU)
Windows 7 64x
Intel i7-4770 CPU @ 3.40Ghz
NVidia Geforce GTX980 (I also noticed this issue on my previous setup which was two GTX760's SLI'd together)
Driver Version 347.25
Is there any other relevant information? (e.g. logs, screenshots,
configuration files)
Not really.
Updated by england-reece almost 10 years ago
Based on the code difference between 4217 and 4219, it would appear that the new Bounding Box code is to blame. Some of the crucial code implementing Bounding Boxes in 4219 is behaving very poorly with Paper Mario.
Updated by Sonicadvance1 almost 10 years ago
- Status changed from New to Questionable
Yes, it's using bounding boxes which require atomics on the GPU.
The higher your internal resolution, the more of an impact this will have on you.
Should probably also make sure your GPU is running at its highest clock speed when running Dolphin as well during these to make sure it completes as quickly as possible.
Updated by england-reece almost 10 years ago
These are the results of the testing I just did: http://i.imgur.com/ls8BWa0.png
During the peaks of the GPU, when it's at its max state, the FPS is /exactly/ 30FPS. I think that's an interesting data point. Adding on to that, it is indeed at its highest clock frequency during that state (about 1290Mhz).
During these periods, the Title bar reports an FPS of exactly 30, and a VPS also exactly equal to 30. When I release the trigger and when Mario finishes returning to normal, the sound returns to normal and the FPS and VPS immediately drive back up to 60 and the game plays normally again.
Let me know any other information I could gather.
Updated by england-reece almost 10 years ago
If it's not clear, those moments happen /only/ when Bounding Boxes are being used (when I hold down [R]).
Updated by england-reece almost 10 years ago
Another test with IR set to 1x: http://i.imgur.com/bwId95C.png
The effect is less profound (FPS drops to 44, VPS drops to 53) but it's still noticeable, and as you can see from the GPU log, it's still pushing the GPU to a very high level. You can see the dip in the middle when I released [R] just as a control.
Updated by JMC4789 almost 10 years ago
I hold R in Paper Mario: Thousand Year Door on my GTX 760 and I get 150 fps at 2x IR.
Updated by england-reece almost 10 years ago
150 FPS? Uhh, /how/? I only get >60FPS if I turn off the frame limit (which also causes the game to run at 350FPS/VPS at about 5-6x speed).
I also got the same problem with an uncapped frame limit. ~250FPS/350VPS when not holding down [R], ~33FPS and ~57VPS when holding down [R].
JMC, are you using the OpenGL Backend or the Direct3D backend? Because the Direct3D backend doesn't exhibit these problems, it runs a nearly perfectly smooth 60FPS/VPS at all times.
Updated by JMC4789 almost 10 years ago
I almost always use OpenGL; I was using OpenGL during this test. I'm one of the people who did performance testing on the Hardware Bounding Box stuff; and my GTX 760 was the primary card I tested on, along with a Radeon HD 5850 and a Radeon r9 285.
The HD 5850 was very slow, but it was an old card and to be expected, both the GTX 760 and R9 285 performed well over 60 fps up to 4x IR when I tested Paper Mario's bounding box.
Updated by england-reece almost 10 years ago
So there's gotta be something with the 980 that doesn't like the algorithm that's being used. Or possibly something with my configuration: you keep saying your FPS values are much greater than 60, yet mine never /ever/ go above 60 (unless I disable the cpu frame limit) and for the record, I have vsync turned off-it still caps at 60. So what enables you to get such high FPS values? That might be a place to start with.
Updated by JMC4789 almost 10 years ago
I was disabling the frame-limiter. This is a 60 fps game. For performance testing I always have the framelimiter disabled.
Updated by england-reece almost 10 years ago
Ah, alright.
So what do you make of the absurdly high GPU usage in my graphs associated with the bounding box code: is there a better profiling tool I could use that would give better data? Are there any tests I could perform that would help the dev team?
Updated by crudelios almost 10 years ago
Do you have vsync enabled? If so, please turn it off and test again.
Updated by england-reece almost 10 years ago
Alright, these are the tests I performed, all at 1xIR:
VSync Enabled, Normal movement: http://i.imgur.com/Jlm5qZr.png
VSync Enabled, Rotated movement: http://i.imgur.com/JWq42pB.png
VSync Disabled, Normal movement: http://i.imgur.com/yJ5BoEG.png
VSync Disabled, Rotated movement: http://i.imgur.com/uyDHRUs.png
As you can see, VSync doesn't seem to make a difference.
For the record, my original graphs were with VSync disabled.
Updated by JMC4789 almost 10 years ago
I don't see this issue on my GTX 760 still.
Updated by england-reece almost 10 years ago
I'd say it's very likely that this issue is particular to 900 series cards, or the 980 in particular. It would be nice if anyone with the same card as me could do testing on this issue.
Updated by gamedevistator almost 10 years ago
I have this issue as well. I have a Radeon HD 6870, but if i set the IR to 1x then the game runs fine. I agree that something with the BBox is causing an issue with the IR.
Updated by JMC4789 over 9 years ago
- Status changed from Questionable to Accepted
- Relates to performance set to Yes
This isn't the duplicate issue. Accepting until degasus says something.
Updated by degasus over 9 years ago
The high GPU load with the hardware bbox code is well knows. Older (especially AMD) GPUs are known to perform very bad with atomics.
As only some games are known to use bbox at all, we've merged a new option to force disable bbox. So almost all (but this rare game) should run fine again.
Updated by JMC4789 over 9 years ago
Issue 8877 has been merged into this issue.
Updated by england-reece almost 9 years ago
Update: I'm using the same computer in Windows 10, with Dolphin 4.0-8231 and am still experiencing this issue with OpenGL only.
I did some digging through the source code, and it looks like the GLSL shader code is generated by post-translating HLSL code into GLSL code. Is it possible that the way Atomics are being used in GLSL is less optimized than intended?
My other theory is that the Maxwell cards may be misreporting their support for Atomics (or rather, misreporting that they don't have support for Atomics). If so, would there be a way to know that?
Updated by phire almost 9 years ago
You know, in my ubershaders branch, I've run into the problem that DirectX on my GTX 960 is about 2-3x faster than OpenGL.
The OpenGL backend does 3x IR at full speed, while the DirectX backend can easily do 5x IR
And I have no idea why, it's running almost identical code on both backends (the DirectX code is theoretically slightly worse) it doesn't use atomics (yet)
Updated by seapancake almost 9 years ago
I can replicate this on my Geforce GTX 970 (359.09) consistently in OpenGL and can confirm issue doesn't happen in D3D. When entering/exiting the pipe the fps drops to about ~30fps and the pipe sound stutters
Testing done at 1X IR, Windows 10,i7-6700K, 16GB RAM
Save state for testing: https://mega.nz/#!IRck2bLZ
Updated by skidau almost 9 years ago
- Status changed from Accepted to Questionable
- Regression changed from Yes to No
Higher IR is going to be slower because there is more data to process.
Updated by england-reece almost 9 years ago
skidau wrote:
Higher IR is going to be slower because there is more data to process.
The issue has nothing to do with IR, it has to do with Atomics. Someone changed the name of the issue a while back, and I'm too green to confidently contest their change.
Updated by JosJuice almost 9 years ago
We are not going to go back to the old software bounding box emulation that didn't use atomics - see issue 8931. Therefore, making this issue report be about atomics would result in it being closed.
Updated by JMC4789 over 8 years ago
- Status changed from Questionable to Won't fix
D3D12 quells my need for this issue to be open. It's possible the same optimization can work on OpenGL as well.
The original issue that it gets more demanding at higher IRs is invalid. The speed loss is wontfix/fixed. So I just kind of put this to inbetween.