Project

General

Profile

Actions

Emulator Issues #3081

closed

GPU emulation causes slowdowns with some games

Added by Xtreme2damax over 13 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
-
Category:
GFX
% Done:

0%

Operating system:
N/A
Issue type:
Feature request
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
Yes
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:

Description

What steps will reproduce the problem?

  1. Start a profiler such as Vtune, select Dolphin as the executable.

  2. Run specific games with known slowdown issues, I tested Super Mario Galaxy, Super Mario Galaxy 2, and even Zelda Twilight Princess exhibits the same issue in Hyrule Field.

  3. Observe the results (I've linked to a screenshot of the profiler analysis)

http://i40.photobucket.com/albums/e204/muffin2121/vtune.jpg

What is the expected output? What do you see instead?

I expected the games to run at normal speed, instead some games exhibit slowdowns for no apparent reason, even when there is minimal geometry to render such as on the space ship in Super Mario Galaxy 2 and Beat Block galaxy. All of these slowdowns point to the current Fifo implementation, the Fifo_EnterLoop function being the culprit.

Dolphin version with the problem? Other Dolphin version without the
problem?

Almost every Dolphin revision exhibits these slowdowns and is affected.

32-bit or 64-bit and any other build parameters?

I always use the 64-bit version of Dolphin, my builds are compiled with the Intel C++ compiler with SSE4.1 compiler optimization.

OS version and versions of tools/libraries used?

I use Windows 7 x64 as my primary operating system. I used Intel Vtune to benchmark the games and analyze what is responsible for the slowdowns. Fixing the issue with Hyrule Field slowdowns will most likely fix other games affected by these weird slowdown issues. So far geometry doesn't seem to be responsible for the slowdowns, it seems to be some other underlying issue with the emulator.

Please provide any additional information below.

In addition to the Fifo_EnterLoop function, unlike the benchmarks with Zelda Twilight Princess, there now seems to be additional function also responsible for slowdowns in Super Mario Galaxy 2 at least, the function that is referred to as "set" in the Vtune benchmark results. The Fifo_Enterloop function also seems to be consuming a lot of resources as it did with Hyrule Field in Zelda Twilight Princess.

Either the emulator doesn't support a feature that exists on the actual hardware or the current Fifo implementation, specifically the outlined functions are inefficient.

Actions #1

Updated by Xtreme2damax over 13 years ago

The other hotspots of interest point to the threading and syncing functions of the emulator such as spin/waits and timing, the Fifo and Fifo_EnterLoop function is likely tied into these which causes the slowdowns.

Actions #2

Updated by skidau over 13 years ago

MarcosV is looking into this.

Actions #3

Updated by Xtreme2damax over 13 years ago

The other hotspots in the screenshot should be noted as well, just that the top two are the most intensive. It's probably a combination of issues that cause the problem, my guess is that it is a combination of timing, syncing, fifo, and rendering issues that cause the slowdowns. I'm not sure if spin/waits are part of the timing, but that also seems be consuming resources as well.

Bottlenecks with graphics emulation should also be factored in as part of the issue as well, as it should be noted that slowdowns with the DX11 and OpenGL plugin are more severe than with DX9. A couple of Ayuanx's fifo and timing commits also significantly reduced performance in these games.

Here are updated results as well as some additional results from the benchmarks/profiling:

http://i40.photobucket.com/albums/e204/muffin2121/dolphin-process.jpg

http://i40.photobucket.com/albums/e204/muffin2121/dolphin-threads.jpg

http://i40.photobucket.com/albums/e204/muffin2121/dolphin-thread70-hotspots.jpg

http://i40.photobucket.com/albums/e204/muffin2121/thread70-pluginvideodx9-hotspots.jpg

http://i40.photobucket.com/albums/e204/muffin2121/thread122-hotspots.jpg

http://i40.photobucket.com/albums/e204/muffin2121/thread122-hotspots-dolphinexe.jpg

http://i40.photobucket.com/albums/e204/muffin2121/thread122-hotspots-pluginvideodx9.jpg

Actions #4

Updated by Xtreme2damax over 13 years ago

If you have a profiler such as Vtune, developers can view the problem code with Vtune, by clicking on the hotspots. The actual code should be viewable, but you will need the .pdb files to view the hotspots and code, you don't need to use the debug build. I recommend developers to grab the free 30 day trial of Vtune and test this themselves, since the developers will be able to see what is going on.

Actions #5

Updated by Xtreme2damax over 13 years ago

Developers may also want to run the games with overlay statistics enabled to see what is being done/loaded at during these slowdowns.

Actions #6

Updated by marcosvitali over 13 years ago

  • Status changed from New to Work started
  • Issue type set to Feature request

Xtreme2damax: I've rewrited the FIFO 2 times in differents projects. One of this add another thread but at the moment are only experiments. I need more time. The past weekend I worked 16 hs in that. These games are graphics heavy and use breakpoints in the FIFO for parallel processs (CPU/GPU) so, are very sensibles to FIFO/GP emulation. I am interesting if Could you report me if in SMG1/SMG2 has a lot of FIFO_RESET when you have many slowdowns, you can log that in ProcessorInterface.cpp because I can improve AbortFrame.

Actions #8

Updated by marcosvitali over 13 years ago

Xtreme: I want to do this statictis. How long is Decoding per frame in SMG1/2. With this indicator you know if we need to improve FIFO and other stuff or Decoding for this game.

Actions #9

Updated by Xtreme2damax over 13 years ago

"How long is Decoding per frame in SMG1/2. With this indicator you know if we need to improve FIFO and other stuff or Decoding for this game."

How do I find out that information?

I realize that these games may be heavier than other games, although it is apparent that there is a serious bottleneck somewhere. I remember in the past Spin/Waits along with what you just mentioned being used to fix performance in some games, it would seem that something was messed up somewhere or not implemented correctly. A few of Ayuanx's commits also decreased performance, I know you are already aware of that.

What's weird is that less graphics heavy galaxies have slowdowns such as spaceship Mario and beat block galaxy, while some more intensive galaxies run faster or at full speed. It seems that quite a few games have these weird or unexplainable slowdowns that can likely be traced back to what is mentioned in this issue.

Actions #10

Updated by Xtreme2damax over 13 years ago

"I am interesting if Could you report me if in SMG1/SMG2 has a lot of FIFO_RESET when you have many slowdowns, you can log that in ProcessorInterface.cpp because I can improve AbortFrame."

Will I need to use the Debug build for that or can the normal build with logging be used?

Actions #11

Updated by marcosvitali over 13 years ago

Is a WarnLog of ProcessorInterface this is the line:
WARN_LOG(PROCESSORINTERFACE, "Fifo reset (%08x)", _uValue);

Actions #12

Updated by marcosvitali over 13 years ago

"How long is Decoding per frame in SMG1/2. With this indicator you know if we need to improve FIFO and other stuff or Decoding for this game."
I need to program this metric. I need to count the time where this is called in FIFO LOOP OpcodeDecoder_Run(g_bSkipCurrentFrame); and reset the counter when 1 frame is drawn. Tonight I will program this metric and I will post the results later.
This function process the memory in the videobuffer. If that consume a lot of time so the problem is for Rodolfo :D

Actions #13

Updated by marcosvitali over 13 years ago

Xtrem: I think that in release buid you can configure that level log in "Log Windows"

Actions #14

Updated by Xtreme2damax over 13 years ago

I couldn't get it to generate any Fifo Reset warnings so it appears no fifo resets are being done. Here is the output that was generated for the whole game, not just areas that are running slow. I've also included the video interface output, this is from the Debug_Log as nothing else was generating anything:

03:05:986 .\Src\Hw\VideoInterface.cpp:777 D[VI]: (VI->BeginField): addr: 10F52B60 | FieldSteps 40 | FbSteps 40 | ACV 456 | Field Progressive
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x1001, 0xcc002034
03:05:986 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_VI (clear)
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x1c8c, 0xcc002000
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x001e, 0xcc00200c
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x0048, 0xcc00200e
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x001e, 0xcc002010
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x0048, 0xcc002012
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x1087, 0xcc00201c
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0xfe0c, 0xcc00201e
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0x0087, 0xcc002024
03:05:986 .\Src\Hw\VideoInterface.cpp:413 D[VI]: (w16): 0xfe0c, 0xcc002026
03:05:986 .\Src\Hw\VideoInterface.cpp:408 D[VI]: (r16): 0x0001, 0xcc00206c
03:05:987 .\Src\Hw\VideoInterface.cpp:408 D[VI]: (r16): 0x0045, 0xcc00206e
03:05:991 .\Src\HW\ProcessorInterface.cpp:129 D[PI]: read writepointer, value = 00af6240
03:05:991 .\Src\HW\ProcessorInterface.cpp:129 D[PI]: read writepointer, value = 00af62a0
03:05:992 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:992 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:992 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:992 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:992 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:992 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:993 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:993 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:993 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:993 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:993 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:993 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:995 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:995 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:995 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_DSP (set)
03:05:995 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_DSP (clear)
03:05:995 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:05:996 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:05:996 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:05:996 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:05:997 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:05:997 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:05:998 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:05:998 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:06:015 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_PE_TOKEN (set)
03:06:015 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_CP (set)
03:06:016 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_PE_TOKEN (clear)
03:06:016 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_CP (clear)
03:06:016 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_PE_TOKEN (set)
03:06:016 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_PE_TOKEN (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_PE_FINISH (set)
03:06:029 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_PE_FINISH (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:06:029 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:06:029 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:06:029 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (set)
03:06:029 .\Src\HW\ProcessorInterface.cpp:246 D[PI]: Setting Interrupt INT_CAUSE_WII_IPC (clear)
03:06:029 .\Src\HW\ProcessorInterface.cpp:241 D[PI]: Setting Interrupt INT_CAUSE_VI (set)

Actions #15

Updated by marcosvitali over 13 years ago

Perfect! :D Tonigh I will cronometer OpcodeDecoder_Run(g_bSkipCurrentFrame) and I will post the result.

Actions #16

Updated by Xtreme2damax over 13 years ago

What do you mean by cronometer? It appears OpcodeDecoder_Run(g_bSkipCurrentFrame); already exists in fifo.cpp.

Tell me what to do and I can test it, providing the change is simple enough.

Actions #17

Updated by marcosvitali over 13 years ago

SMG - Observatory - E7400 - Nvidi gtx 260 core 216.
FPS: Frames per second Decoding MS: OpcodeDecoder_Run Consuming time per frame

59:40:830 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 896
59:41:838 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 904
59:42:844 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 912
59:43:858 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 911
59:44:870 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 906
59:45:892 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 920
59:46:900 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 906
59:47:900 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 904
59:48:900 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 896
59:49:906 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 909
59:50:906 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 902
59:51:925 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 908
59:52:938 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 919
59:53:947 .\Src\Render.cpp:1280 N[FileMon]: FPS: 42 - Decoding MS: 910
59:54:951 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 900
59:55:957 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 906
59:56:974 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 855
59:57:996 .\Src\Render.cpp:1280 N[FileMon]: FPS: 43 - Decoding MS: 901
59:59:012 .\Src\Render.cpp:1280 N[FileMon]: FPS: 45 - Decoding MS: 906
00:00:030 .\Src\Render.cpp:1280 N[FileMon]: FPS: 43 - Decoding MS: 906
00:01:032 .\Src\Render.cpp:1280 N[FileMon]: FPS: 41 - Decoding MS: 887
00:02:055 .\Src\Render.cpp:1280 N[FileMon]: FPS: 41 - Decoding MS: 925
00:03:060 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 903
00:04:065 .\Src\Render.cpp:1280 N[FileMon]: FPS: 38 - Decoding MS: 883
00:05:080 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 889
00:06:097 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 902
00:07:121 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 926
00:08:122 .\Src\Render.cpp:1280 N[FileMon]: FPS: 39 - Decoding MS: 907
00:09:137 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 916
00:10:146 .\Src\Render.cpp:1280 N[FileMon]: FPS: 44 - Decoding MS: 884
00:11:158 .\Src\Render.cpp:1280 N[FileMon]: FPS: 57 - Decoding MS: 850
00:12:170 .\Src\Render.cpp:1280 N[FileMon]: FPS: 61 - Decoding MS: 874
00:13:173 .\Src\Render.cpp:1280 N[FileMon]: FPS: 60 - Decoding MS: 855
00:14:174 .\Src\Render.cpp:1280 N[FileMon]: FPS: 60 - Decoding MS: 823
00:15:189 .\Src\Render.cpp:1280 N[FileMon]: FPS: 61 - Decoding MS: 794
00:16:190 .\Src\Render.cpp:1280 N[FileMon]: FPS: 60 - Decoding MS: 792
00:17:191 .\Src\Render.cpp:1280 N[FileMon]: FPS: 60 - Decoding MS: 812
00:18:201 .\Src\Render.cpp:1280 N[FileMon]: FPS: 60 - Decoding MS: 856

Xtreme2damax: This log demostrate that in SMG the all time is consumed by OpcodeDecoder_Run function, so there is no problems related to FIFO, SPIN, WAITS, etc.

Actions #18

Updated by marcosvitali over 13 years ago

I've elminated CommandProcessor::isFifoBusy = true; of the fifo loop to produce a VI desync on purpose. This is the same before log.

36:35:929 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 648
36:36:954 .\Src\Render.cpp:1280 N[FileMon]: FPS: 32 - Decoding MS: 655
36:37:982 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 669
36:39:016 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 641
36:40:021 .\Src\Render.cpp:1280 N[FileMon]: FPS: 32 - Decoding MS: 634
36:41:038 .\Src\Render.cpp:1280 N[FileMon]: FPS: 32 - Decoding MS: 618
36:42:043 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 636
36:43:077 .\Src\Render.cpp:1280 N[FileMon]: FPS: 37 - Decoding MS: 716
36:44:103 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 631
36:45:131 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 666
36:46:131 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 721
36:47:131 .\Src\Render.cpp:1280 N[FileMon]: FPS: 35 - Decoding MS: 745
36:48:148 .\Src\Render.cpp:1280 N[FileMon]: FPS: 32 - Decoding MS: 690
36:49:181 .\Src\Render.cpp:1280 N[FileMon]: FPS: 35 - Decoding MS: 757
36:50:215 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 712
36:51:230 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 666
36:52:236 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 732
36:53:243 .\Src\Render.cpp:1280 N[FileMon]: FPS: 34 - Decoding MS: 750
36:54:248 .\Src\Render.cpp:1280 N[FileMon]: FPS: 36 - Decoding MS: 681
36:55:248 .\Src\Render.cpp:1280 N[FileMon]: FPS: 55 - Decoding MS: 739
36:56:265 .\Src\Render.cpp:1280 N[FileMon]: FPS: 59 - Decoding MS: 790
36:57:266 .\Src\Render.cpp:1280 N[FileMon]: FPS: 55 - Decoding MS: 708
36:58:282 .\Src\Render.cpp:1280 N[FileMon]: FPS: 55 - Decoding MS: 695

Xtreme2damax: You can see, that there is a sync problem produced intentionally by me and OpcodeDecoder_Run does not consume the all time.

Actions #19

Updated by marcosvitali over 13 years ago

Xtreme2damax: OpcodeDecoder_Run is responsable for read the memory buffer, decode graphics commands and execute these. Beside that receive a parameter boolean g_bSkipCurrentFrame that is passed in TRUE when you active frame skip in dolphin. I hope answer your dudes.

Actions #20

Updated by marcosvitali over 13 years ago

*sorry is Decoding MS: OpcodeDecoder_Run Consuming time per second

Actions #21

Updated by marcosvitali over 13 years ago

Xtreme: Sorry for my english "Cronometer" does not exists. "How long is.."

Actions #22

Updated by Xtreme2damax over 13 years ago

What is the problem that is causing these slowdowns? Even non graphics heavy levels in SMG2 such as the spaceship and beat block galaxy have slowdowns, while other levels that are more heavy have little to no slowdowns. Fifo_Enterloop seems to be the most intensive according to the profiler results so I thought that maybe there is a bottleneck in the fifo or graphics emulation somewhere. Hyrule Field in ZTP runs full speed until the Fifo_EnterLoop function is used, which happens by visiting any other parts of Hyrule Field except for Hyrule Field SouthWest in the Gamecube version. Once Fifo_Enterloop kicks in, there is slowdowns that are often at times severe.

What can be done to fix or alleviate the issue?

Actions #23

Updated by marcosvitali over 13 years ago

Xtreme: Improve the GP emulation part in smg for example rodolfo told me this game has a lot of polygons or same games like ZTP was progamming bad and the states change all time and we need to draw all time. Tonight I will take metrics of graphics parts for example VertexManager::Flush() how long is per second in SMG, etc.
Conclusion: Should be change the title of this issue for "Improve GP emulation for SMG, ZTP, etc"

Actions #24

Updated by marcosvitali over 13 years ago

Forget something: OpcodeDecoder_Run is called in FIFO LOOP when the memory is copied from the FIFO to the VideoBuffer. That's why you see that in the profiler. Beside the the number of the polygons is important but the states changes too. Anyway I will take different metric of the process called by OpcodeDecoder_Run and post the result.

Actions #25

Updated by Xtreme2damax over 13 years ago

That still doesn't explain why certain galaxies and areas such as the space ship and beat block galaxy run slower or just as slow than other galaxies that have more to render/process. :P

Another thing is that certain games that are graphics heavy run faster than certain games that are not graphics heavy.

It seems to only be when the Fifo_EnterLoop/OpcodeDecoder_Run kicks in that there are these slowdowns.

The Twilight Princess Hyrule Field speedup hack doesn't work with other games so that is pretty much useless for everything else except Zelda Twilight Princess.

Actions #26

Updated by marcosvitali over 13 years ago

I am not expert in graphic part but "Hyrule Field speedup hack" skip some change states. I said you some time the games changes all time states and you need draw a lot of times. Thats why the slowdonw in ZTP is not for quantity of poligons. I will research what graphics process is cosuming time in SMG slowdown. I think you need undestand first the graphic processes emulation, i am not an expert but something is clear here the problems is GP emulation. You can ask to Rodolfo about this.

Actions #27

Updated by Xtreme2damax over 13 years ago

What about also CC'ing the issue to rodolfo?

I believe Kiesel or another coder that worked on the ZTP speedup hack mentioned that something should be done to speed up pipeline flushes or reduce the intensity of the pipeline flushes. There is likely other bottlenecks with graphics emulation, hopefully rodolfo or other developers can look into these issues.

Btw, did you have any success with what you were going to try last night?

Actions #28

Updated by marcosvitali over 13 years ago

Yes, The last night I maked the Log for you ;P It was a misunderstanding.

Actions #29

Updated by marcosvitali over 13 years ago

  • Status changed from Work started to Questionable
Actions #30

Updated by Xtreme2damax over 13 years ago

Why the "Questionable" status, is this not fixable?

Anyways, I'm glad it's being looked into and I'm trying to be as much help as I can.

In order to test the same thing you did, do I just remove the following line from fifo.cpp?

CommandProcessor::isFifoBusy = true;

Actions #31

Updated by marcosvitali over 13 years ago

Yes, I've removed this line only for produce VI desync on purpose.And desmostrate my theory. Because in this case you have lees FPS and less Decoding.

With isFIfoBusy:

59:53:947 .\Src\Render.cpp:1280 N[FileMon]: FPS: 42 - Decoding MS: 910
59:54:951 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 900
59:55:957 .\Src\Render.cpp:1280 N[FileMon]: FPS: 40 - Decoding MS: 906

WithOut isFIfoBusy:

36:44:103 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 631
36:45:131 .\Src\Render.cpp:1280 N[FileMon]: FPS: 31 - Decoding MS: 666
36:46:131 .\Src\Render.cpp:1280 N[FileMon]: FPS: 33 - Decoding MS: 721

Questionable: Undecided whether to work on this or not. (Gogle code definition)

Because, Thats is an improvment.

Actions #32

Updated by Xtreme2damax over 13 years ago

I thought questionable meant that others confirm it as an issue but developers aren't able to reproduce it exactly as it was reported.

I would really hope this would be worked on considering that it affects performance in numerous games. Another thing, what about the performance issues Ayuanx's fifo and timing commits caused? His commits decreased performance in quite a few games such as SMG/SMG2 and Resident Evil Code Veronica X with the frame limiter and audio throttle.

Actions #33

Updated by marcosvitali over 13 years ago

I dont agree with you the fifo works almost perfect now in SMG you can see the log. The 90% of time the GP thread is decoding. There no FIFO issues affecting the performance in any game. The resting FIFO issues are bugs related with ReadWriteDistance invalid that produce hangs/crash. The VI timing issue are pending but that no affecting the perfomance in many games because my patch R5777 solves most problems in DC. Id like to solve this VI Issue 2571 first. Ayuanx did a good job in the FIFO and in Frame Limiter. I was working in the FIFO the last 3 month and I appreciate his work.

Actions #34

Updated by Xtreme2damax over 13 years ago

I'm just reminding of the issues that were reported by other Dolphin, some games with Dolphin have gotten slower after Ayuanx's commits. In those commits Ayuanx modified timing and fifo which is why I mentioned those besides what was profiled with Vtune.

I'm referring to commits of Ayuanx's a while back such as the 4xxx revisions and later.

But these don't seem to be the issue with SMG, SMG2 or ZTP so I'll let it be for now. I am just hoping whatever is causing performance issues will be fixed someday, especially since these games are near perfect now.

Thanks for looking into it, at least I learned something.

Actions #35

Updated by marcosvitali over 13 years ago

Xtreme2damax: When ayuanx back he've implemented Acurate BP in dualcore mode, implementes readIdle, cmdIdle states fifo, and remove hacks in CommandProcesor for ReadPointer, WritePointer, ReadWriteDistance. That was AWESOMEEE!!! Maybe I apreciate that because I understand the code. When he removed ReadWriteDistance hack the Real bugs like GatherPipe overflown were discovered. That was a good advance. Id prefer to advance in the emulation instead of nasty hacks like ReadWriteDistance hack although some games like REO crashes now.

Actions #36

Updated by Xtreme2damax over 13 years ago

I also appreciate what developers are doing, but I don't get what those have to do with performance. Are you trying to say that accuracy decreased performance? If so, is there any way to optimize these commits of Ayuanx's?

Actions #37

Updated by marcosvitali over 13 years ago

IMO the ayuanx FIFO commits are very good, and not related with perfomance. If not, can you show me what FIFO Ayuanx commit lost performance please.

Actions #38

Updated by Xtreme2damax over 13 years ago

http://code.google.com/p/dolphin-emu/source/detail?r=4798

"Mario Kart Wii suffers from this commit... now the gamespeed is very slow :/"

"What about the gamespeed?? I also checked Super Mario Galaxy. It's absolutely unplayable now."

Since then or around that time SMG has lost about 10 - 15 FPS in the observatory, speed in some levels was also lost. I'm not sure it that commit was responsible, I just saw the users reporting a loss of speed.

Actions #39

Updated by marcosvitali over 13 years ago

That is not FIFO commit. Only VI timing is modified in this commit. Mario Kart WII in my pc is 55-60 FPS the Observatory 50 FPS. I've fixed the VI timing in r5777. I worked 3 week for solve "Since then or around that time SMG has lost about 10 - 15 FPS in the observatory" issue. Xtreme is easy to uNderstand if the SMG in the logs use 900 milisecond to decoding. THERE NO TIMINGS PROBLEMS BECAUSE ALL TIME THE GP THREAD IS PROCESING GRAPHICS COMMANDS. The only real problem in SMG now is theres is a lot of HARD VertexManager:Flush(). So we need to improve GP emulation no timings, no fifo. Everithing is ok beceause the bottleneck now is the graphic OPDecoding because of represents the 90% of the work when you have slowdowns.

Actions #40

Updated by marcosvitali over 13 years ago

Imagine thas scenary you have the CPU and the GPU. The CPU genereate the graphics command and GPU process the Graphic Commands. the FIFO is the channel to send this commands from CPU to GPU. SO if the GPU the 90% of time is processing command that means there no timings problems because always the GPU is working. Thats is really good!! But the problem is the GPU is really slow.

Actions #41

Updated by Xtreme2damax over 13 years ago

I know that, I thought timing may be tied in with the fifo somehow since timing issues or issues with spin/waits caused weird slowdowns in the past.

I realize that this is not the issue now so thanks for explaining.

I did lose 10 - 15 FPS in the observatory of SMG in the past, the observatory used to run at 45 FPS - 48 FPS but now I get 30 FPS - 35 FPS so hopefully it can be figured what is wrong with GP emulation.

Sorry if I annoyed you somehow. :/

Actions #42

Updated by rodolfoosvaldobogado over 13 years ago

I have one correction to do, is not the GPU that is slowing down things, is the amount of cpu consumed by decoding and calling the commands.

Actions #43

Updated by marcosvitali over 13 years ago

Thanks rodolfo correction :D, I was speaking about the GPU emulation not real PC GPU. Xtrem: I have now 50 fps in the observatory and in the past too. What is your HW is very starnge and your configuration?

Actions #44

Updated by skidau over 13 years ago

  • Status changed from Questionable to Accepted
  • Category set to gfx
  • Relates to performance set to Yes
  • Operating system N/A added

Maybe we should JIT/ASM the Graphics Opcode Decoder if that is what is causing slowdown in SMG. Or multi-thread it. Or both.

Actions #45

Updated by Xtreme2damax over 13 years ago

I made a mistake, I get about 40 FPS - 45 FPS in the observatory with SMG1 but it fluctuates wildly and dips. Some levels in SMG2 run at 30 FPS - 40 FPS, with slowdowns on the spaceship where there isn't as much to render as the actual galaxies. Slowdowns are more severe with OGL and DX11 than DX9.

I apologize if I am repeating myself..

Actions #46

Updated by samljer over 13 years ago

Ive noticed similar things in a debugging program i use
at work. it seems that even with IDLE skipping enabled
dolphin sure as hell still "idles", lots.
its probably related to sync issues however.

but ive noticed they say "video card isnt that important its all CPU"
which is BULLSH*T as theres so much idle time with slow video
that a fast one increases CPU cycles in the program.

probably necessary for proper function, but get the best video card you can still
no doubt.

Actions #47

Updated by chicuong.hua over 13 years ago

I don't think you understand. Core 1 handles the gc/wii cpu via jit/jitil/interpreter, while core 2 handles the gc/wii gpu. Your graphic card doesn't need to be spectacular to handle whatever core 2 spits out. What the devs are attempting to do is the jit the decoding opcodes for the wii gpu on second core according to what I read.

Actions #48

Updated by Xtreme2damax over 13 years ago

We are referring to the emulated GPU for the GC and Wii, not the PC gpu. The PC gpu will make very little difference to performance, unless you want to play at high resolutions with SSAA and efb scale then a better gpu would be needed for that.

Currently OpcodeDecoder_Run uses a lot of time/resources to execute or decode commands which decreases performance in some that, that is what is being looked into right now.

Actions #49

Updated by Xtreme2damax over 13 years ago

I know VideoCommon has some optimizations for SSSE3 and SSE4, but what about further optimizing Videocommon as well as optimizing the plugins for SSSE3 and SSE4 ala what PCSX2 does for Gsdx? ICC may already optimize for SSE4, but it isn't true optimization and hand coded optimizations usually net much better performance and results that compiler specific optimization generated during the compile process.

I was thinking that JIT'ing these functions as well as optimizing them for SSSE3 and SSE4 as long as the processor supports these instruction sets might yield a decent increase of performance.

I'm also interested to hear if anything else has been discussed about this issue, any other ideas how to solve these performance issues?

Actions #50

Updated by rhyviolin over 13 years ago

This is still an issue as of r6566.

Actions #51

Updated by daws72 over 13 years ago

I think my issue 2811 , is similar to the problem reported here.

Actions #52

Updated by skidau about 13 years ago

  • Status changed from Accepted to Won't fix

I ran some profiling on a few of the slowest games like Super Mario Galaxy, Metroid Prime, Mario Kart: Double Dash and Mario Kart: Wii, focusing on the graphics emulation. Each of those games came to the same conclusion - it is the RunVertices() function that is the slowest. The function is slow due to the sheer number of vertices (polygons) being rendered on the screen.

As the vertices are streamed to the GPU, I could not think of a way to speed it up (using a cache or multi-threading will not work afaict). The only ones that can be sped up are the display lists and the DLCache code already takes care of that by JIT'ing the code.

Closing this issue as there is not much we can do about it at this point in time.

Actions

Also available in: Atom PDF