Project

General

Profile

Emulator Issues #8161

Hardware Bounding Box gets exponentially more demanding as IR increases

Added by england-reece over 5 years ago. Updated about 4 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
-
% Done:

0%

Operating system:
N/A
Issue type:
Bug
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
Yes
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:

Description

Game Name?
Paper Mario: Thousand Year Door

Game ID?
G8ME01

What's the problem? Describe what went wrong in few words.
The game runs nearly flawlessly at all times, except when Mario turns sideways by pressing [R], or when he curls up to go down a pipe, or under other circumstances that transform the Mario Sprite, at which point the FPS falls down immediately to 30-40FPS. Only happens on the OpenGL Backend.

What did you expect to happen instead?
The FPS /not/ to drop to 30-ish.

What steps will reproduce the problem?
[Don't assume we have ever played the game and know any level names. Be as
specific as possible.]
1. At any time after Mario gains the ability to pass between pipes by turning sideways, press and hold down [R]. The FPS will immediately halve itself.

Dolphin 3.5 and 3.5-367 are old versions of Dolphin that have
known issues and bugs, so don't report issues about them and test the
latest Dolphin version first.
Which versions of Dolphin did you test on?
4.0.5328

Does using an older version of Dolphin solve your issue? If yes, which
versions of Dolphin used to work?
4.0-4217 works flawlessly.

It would appear 4.0-4219 (OGL: implement bounding box support with ssbo (PR #1550 from degasus)) introduced this bug.

What are your PC specifications? (including, but not limited to: Operating
System, CPU and GPU)
Windows 7 64x
Intel i7-4770 CPU @ 3.40Ghz
NVidia Geforce GTX980 (I also noticed this issue on my previous setup which was two GTX760's SLI'd together)
Driver Version 347.25

Is there any other relevant information? (e.g. logs, screenshots,
configuration files)
Not really.


Related issues

Has duplicate Emulator - Emulator Issues #8877: Wii Super Paper Mario Massive Slowdown in Elevators (8 FPS)Duplicate

History

#1 Updated by england-reece over 5 years ago

Based on the code difference between 4217 and 4219, it would appear that the new Bounding Box code is to blame. Some of the crucial code implementing Bounding Boxes in 4219 is behaving very poorly with Paper Mario.

#2 Updated by Sonicadvance1 over 5 years ago

  • Status changed from New to Questionable

Yes, it's using bounding boxes which require atomics on the GPU.
The higher your internal resolution, the more of an impact this will have on you.
Should probably also make sure your GPU is running at its highest clock speed when running Dolphin as well during these to make sure it completes as quickly as possible.

#3 Updated by england-reece over 5 years ago

These are the results of the testing I just did: http://i.imgur.com/ls8BWa0.png

During the peaks of the GPU, when it's at its max state, the FPS is /exactly/ 30FPS. I think that's an interesting data point. Adding on to that, it is indeed at its highest clock frequency during that state (about 1290Mhz).

During these periods, the Title bar reports an FPS of exactly 30, and a VPS also exactly equal to 30. When I release the trigger and when Mario finishes returning to normal, the sound returns to normal and the FPS and VPS immediately drive back up to 60 and the game plays normally again.

Let me know any other information I could gather.

#4 Updated by england-reece over 5 years ago

If it's not clear, those moments happen /only/ when Bounding Boxes are being used (when I hold down [R]).

#5 Updated by england-reece over 5 years ago

Another test with IR set to 1x: http://i.imgur.com/bwId95C.png

The effect is less profound (FPS drops to 44, VPS drops to 53) but it's still noticeable, and as you can see from the GPU log, it's still pushing the GPU to a very high level. You can see the dip in the middle when I released [R] just as a control.

#6 Updated by JMC4789 over 5 years ago

I hold R in Paper Mario: Thousand Year Door on my GTX 760 and I get 150 fps at 2x IR.

#7 Updated by england-reece over 5 years ago

150 FPS? Uhh, /how/? I only get >60FPS if I turn off the frame limit (which also causes the game to run at 350FPS/VPS at about 5-6x speed).

I also got the same problem with an uncapped frame limit. ~250FPS/350VPS when not holding down [R], ~33FPS and ~57VPS when holding down [R].

JMC, are you using the OpenGL Backend or the Direct3D backend? Because the Direct3D backend doesn't exhibit these problems, it runs a nearly perfectly smooth 60FPS/VPS at all times.

#8 Updated by JMC4789 over 5 years ago

I almost always use OpenGL; I was using OpenGL during this test. I'm one of the people who did performance testing on the Hardware Bounding Box stuff; and my GTX 760 was the primary card I tested on, along with a Radeon HD 5850 and a Radeon r9 285.

The HD 5850 was very slow, but it was an old card and to be expected, both the GTX 760 and R9 285 performed well over 60 fps up to 4x IR when I tested Paper Mario's bounding box.

#9 Updated by england-reece over 5 years ago

So there's gotta be something with the 980 that doesn't like the algorithm that's being used. Or possibly something with my configuration: you keep saying your FPS values are much greater than 60, yet mine never /ever/ go above 60 (unless I disable the cpu frame limit) and for the record, I have vsync turned off-it still caps at 60. So what enables you to get such high FPS values? That might be a place to start with.

#10 Updated by JMC4789 over 5 years ago

I was disabling the frame-limiter. This is a 60 fps game. For performance testing I always have the framelimiter disabled.

#11 Updated by england-reece over 5 years ago

Ah, alright.

So what do you make of the absurdly high GPU usage in my graphs associated with the bounding box code: is there a better profiling tool I could use that would give better data? Are there any tests I could perform that would help the dev team?

#12 Updated by crudelios over 5 years ago

Do you have vsync enabled? If so, please turn it off and test again.

#13 Updated by england-reece over 5 years ago

Alright, these are the tests I performed, all at 1xIR:

VSync Enabled, Normal movement: http://i.imgur.com/Jlm5qZr.png
VSync Enabled, Rotated movement: http://i.imgur.com/JWq42pB.png
VSync Disabled, Normal movement: http://i.imgur.com/yJ5BoEG.png
VSync Disabled, Rotated movement: http://i.imgur.com/uyDHRUs.png

As you can see, VSync doesn't seem to make a difference.

For the record, my original graphs were with VSync disabled.

#14 Updated by Buddybenj over 5 years ago

  • Regression set to Yes

#15 Updated by JMC4789 over 5 years ago

I don't see this issue on my GTX 760 still.

#16 Updated by england-reece over 5 years ago

I'd say it's very likely that this issue is particular to 900 series cards, or the 980 in particular. It would be nice if anyone with the same card as me could do testing on this issue.

#17 Updated by gamedevistator over 5 years ago

I have this issue as well. I have a Radeon HD 6870, but if i set the IR to 1x then the game runs fine. I agree that something with the BBox is causing an issue with the IR.

#18 Updated by dr-m over 5 years ago

Also have this issue GTX 970 here..

#19 Updated by JMC4789 about 5 years ago

  • Status changed from Questionable to Accepted
  • Relates to performance set to Yes

This isn't the duplicate issue. Accepting until degasus says something.

#20 Updated by degasus about 5 years ago

The high GPU load with the hardware bbox code is well knows. Older (especially AMD) GPUs are known to perform very bad with atomics.
As only some games are known to use bbox at all, we've merged a new option to force disable bbox. So almost all (but this rare game) should run fine again.

#21 Updated by JMC4789 almost 5 years ago

issue 8877 has been merged into this issue.

#22 Updated by england-reece over 4 years ago

Update: I'm using the same computer in Windows 10, with Dolphin 4.0-8231 and am still experiencing this issue with OpenGL only.

I did some digging through the source code, and it looks like the GLSL shader code is generated by post-translating HLSL code into GLSL code. Is it possible that the way Atomics are being used in GLSL is less optimized than intended?

My other theory is that the Maxwell cards may be misreporting their support for Atomics (or rather, misreporting that they don't have support for Atomics). If so, would there be a way to know that?

#23 Updated by phire over 4 years ago

You know, in my ubershaders branch, I've run into the problem that DirectX on my GTX 960 is about 2-3x faster than OpenGL.

The OpenGL backend does 3x IR at full speed, while the DirectX backend can easily do 5x IR

And I have no idea why, it's running almost identical code on both backends (the DirectX code is theoretically slightly worse) it doesn't use atomics (yet)

#24 Updated by seapancake over 4 years ago

I can replicate this on my Geforce GTX 970 (359.09) consistently in OpenGL and can confirm issue doesn't happen in D3D. When entering/exiting the pipe the fps drops to about ~30fps and the pipe sound stutters

Testing done at 1X IR, Windows 10,i7-6700K, 16GB RAM

Save state for testing: https://mega.nz/#!IRck2bLZ

#25 Updated by skidau over 4 years ago

  • Status changed from Accepted to Questionable
  • Regression changed from Yes to No

Higher IR is going to be slower because there is more data to process.

#26 Updated by england-reece over 4 years ago

skidau wrote:

Higher IR is going to be slower because there is more data to process.

The issue has nothing to do with IR, it has to do with Atomics. Someone changed the name of the issue a while back, and I'm too green to confidently contest their change.

#27 Updated by JosJuice over 4 years ago

We are not going to go back to the old software bounding box emulation that didn't use atomics - see issue 8931. Therefore, making this issue report be about atomics would result in it being closed.

#28 Updated by JMC4789 about 4 years ago

  • Status changed from Questionable to Won't fix

D3D12 quells my need for this issue to be open. It's possible the same optimization can work on OpenGL as well.

The original issue that it gets more demanding at higher IRs is invalid. The speed loss is wontfix/fixed. So I just kind of put this to inbetween.

Also available in: Atom PDF