Emulator Issues #484
closedText and Texture Decoding Cause Major Slowdown
Added by tinctorius almost 16 years ago.
0%
Description
What steps will reproduce the problem?
- Start Metroid Prime.
- Press start when you're asked to do so.
- If needed, skip the "format your memory card" screen.
What is the expected output? What do you see instead?
You should enter the pre-game menu, which has no apparent "hard" features
to render: just some text with alpha blending, AFAICS. I expect it to run
at almost the same speed as the "format memory card" screen, but it runs
very, very slowly.
What version of the product are you using? On what operating system?
r1862, Linux x86-64
Please provide any additional information below.
I guess there is something wrong in the video plugin causing slowness with
much alpha blending, but I'm not sure and I don't know where to look.
And yes, this may be 'just' a game issue, but it might also be a deeper
problem causing slowness in other games.
Updated by XTra.KrazzY almost 16 years ago
profile dolphin on all of its threads and report back the result. I'll personally let
you know where the bottleneck is.
Updated by tinctorius almost 16 years ago
I'm pretty new to opconfig/opreport. I have no idea which switches you'd like.
Oh, also, I recompiled with flavor=prof, and suddenly MP1 seems to run with
noticeably higher FPS rates. Weird.
Updated by magumagu9 almost 16 years ago
- Start the game in a build with flavor=prof, and go to a scene you want to profile.
- Run "sudo opcontrol --reset"
- Run "sudo opcontrol --start"
- Wait about 20 seconds
- Run "sudo opcontrol --stop"
- Run "opreport -l -t .2"
- Paste the results here.
Perhaps this should go on the Wiki somewhere.
Updated by tinctorius almost 16 years ago
tinctorius@grass:Linux-x86_64-prof$ opreport -l -t .1 -m tgid
CPU: AMD64 processors, speed 2210.06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of
0x00 (No unit mask) count 100000
samples % image name app name symbol name
712638 30.8325 anon (tgid:13628 range:0x41683000-0x42783000) Dolphin
(no symbols)
522216 22.5938 libPlugin_VideoOGL.so Dolphin
TexDecoder_Decode_real(unsigned char*, unsigned char const*, int, int, int, int, int)
412063 17.8280 Dolphin Dolphin
CommandProcessor::WaitForFrameFinish()
111059 4.8050 libGLcore.so.177.82 Dolphin (no symbols)
110188 4.7673 libc-2.7.so Dolphin memcpy
89436 3.8695 anon (tgid:13628 range:0x4074a000-0x4074c000) Dolphin
(no symbols)
81950 3.5456 no-vmlinux no-vmlinux (no symbols)
33266 1.4393 Dolphin Dolphin
CommandProcessor::GatherPipeBursted()
19105 0.8266 anon (tgid:13628 range:0x40493000-0x40593000) Dolphin
(no symbols)
17820 0.7710 libmozjs.so.1d xulrunner-stub (no symbols)
15031 0.6503 libxul.so xulrunner-stub (no symbols)
9470 0.4097 libpthread-2.7.so Dolphin pthread_mutex_lock
9222 0.3990 oprofiled oprofiled (no symbols)
7497 0.3244 Dolphin Dolphin CoreTiming::Advance()
7321 0.3167 Dolphin Dolphin
Interpreter::subfcx(UGeckoInstruction)
7138 0.3088 libGL.so.177.82 Dolphin (no symbols)
6300 0.2726 libasound.so.2.0.0 Dolphin (no symbols)
6205 0.2685 Dolphin Dolphin
SystemTimers::AdvanceCallback(int)
4849 0.2098 Dolphin Dolphin
Interpreter::addic_rc(UGeckoInstruction)
Updated by memberTwo.mb2 almost 16 years ago
- Status changed from New to Accepted
- Priority set to Low
- Category set to gfx
- Relates to performance set to Yes
- Operating system N/A added
Wow cool tool :)
For information: CommandProcessor::WaitForFrameFinish() is just a way to lock the
emulated PPC until emulated GPU catches up and finish a frame in DC mode. This should
happen only if GPU thread become a real bottleneck. Which is clearly the case here :p
If you have profile only in front of the "Main Menu" (with "new game", etc.),
TexDecoder_Decode_real really needs a kick after all.
In this particular MP1's area, textures are TLUT dependent a lot (fonts), so I guess
we have to boost GX_TF_C4, GX_TF_C8 and GX_TF_C14X2 format decoding. Shouldn't be too
hard in sse. Before someone have to check the textures size because if they're to
small the gain is near zero.
But I wonder if those textures are properly cached though. Seems weird that font
textures, which are obviously quite redundant, have to be decoded again and again.
Updated by tinctorius almost 16 years ago
I profiled just before the "main menu", and kept it running for about twenty seconds
I guess. I didn't count, but I tried to listen to the beautifully stuttering music
for a while. Then I stopped the profiler, posted the report and tried playing the
game again (seriously, how is a profiled version faster than a production version?).
Probably unrelated to the real issue here (as mb2 explained), but I'd still like to
mention it: aren't busy waits evil? :P
Updated by XTra.KrazzY almost 16 years ago
I agree with mb2, font textures should be cached and cropped per letter.
How do we go on and do that? :)
Updated by XTra.KrazzY almost 16 years ago
And also, it won't be hard to convert the cases in the switch to SSE...
One question though: What's with the SSSE3 pshufb comment? How can it replace stuff
to be faster?
Updated by memberTwo.mb2 almost 16 years ago
tinctorius, "I profiled just before the "main menu"" that explain why you have tons
of TexDecoder_Decode_real() calls. When fonts appear, texs are decoded then cached.
This slowdown is quite normal. But, after, when we see the whole text in "Main menu"
it stays slow, less slow but still slow. This is more annoying imo.
Can you re-profile there plz? And it would be nice to resolve the other Dolphin's
symbols maybe by using DEBUGFAST for cores AND plugins. That the way we have all
symbols infos on windows.
"aren't busy waits evil?" depends on what you want to do and if you have a quad or not :p
But in that case, I don't know why I did that in the first place since
WaitForFrameFinish() is just a DC safety func. It will be useless (maybe reverted)
when the emulated GPU will be always fast enough.
XK, Each letter are individually cached iirc, not sure but, I think it's become a
GL/driver/gpu speed issue due to tons of heavy ARGB8888 texs (you know more about gx
stuff than me, it's just a guess). And then , as you said, we have to change the way
we store texs in gpu mem. It would be nice, for some format at least, to decode texs
by the gpu, which isn't very loaded on dolphin so far.
Updated by tinctorius almost 16 years ago
Recompiled with flavor=devel (for DEBUGFAST and -g, though -O3 is gone now =/), rerun
emulation with DC enabled. Even before the Nintendo logo it crashed with things about
POSIX threads. Tried again, and then it spammed my console with
Enter: pthread_mutex_lock(0x7fc8d5be3600) failed: Invalid argument
Leave: pthread_mutex_unlock(0x7fc8d5be3600) failed: Invalid argument
ad infinitum. I get the same messages sometimes when the game crashes on other points.
So I tried flavor=prof (r1866 now; and I just noticed "Can't build prof without
oprofile, disabling"... where's "opagent" in Debian? :P), and profiled ONLY when the
"main menu" was 'stabilized' (after the text appeared). Got this:
CPU: AMD64 processors, speed 2210.06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of
0x00 (No unit mask) count 100000
samples % image name app name symbol name
535753 34.5074 anon (tgid:10524 range:0x415ef000-0x427ef000) Dolphin
(no symbols)
483765 31.1589 libPlugin_VideoOGL.so Dolphin
TexDecoder_Decode_real(unsigned char*, unsigned char const*, int, int, int, int, int)
111101 7.1559 Dolphin Dolphin
CommandProcessor::GatherPipeBursted()
98911 6.3708 libc-2.7.so Dolphin memcpy
86436 5.5673 libGLcore.so.177.82 Dolphin (no symbols)
67002 4.3155 anon (tgid:10524 range:0x427ef000-0x427f2000) Dolphin
(no symbols)
46088 2.9685 no-vmlinux no-vmlinux (no symbols)
6863 0.4420 libpthread-2.7.so Dolphin pthread_mutex_lock
5769 0.3716 libmozjs.so.1d xulrunner-stub (no symbols)
5445 0.3507 libGL.so.177.82 Dolphin (no symbols)
5337 0.3438 Dolphin Dolphin
Interpreter::subfcx(UGeckoInstruction)
5245 0.3378 Dolphin Dolphin CoreTiming::Advance()
5195 0.3346 oprofiled oprofiled (no symbols)
4765 0.3069 Dolphin Dolphin
SystemTimers::AdvanceCallback(int)
4355 0.2805 libxul.so xulrunner-stub (no symbols)
3855 0.2483 libasound.so.2.0.0 Dolphin (no symbols)
3765 0.2425 Dolphin Dolphin
Interpreter::addic_rc(UGeckoInstruction)
3299 0.2125 libpthread-2.7.so Dolphin
__pthread_mutex_unlock_usercnt
Updated by XTra.KrazzY almost 16 years ago
Implemented SSSE3 version of I4decode
The rest seem easy to do as well. Not sure when I'll have time.
The best thing you can do to help is list the important decodes in a descending
order, ordered by the time it takes...
SSE/SSE2 versions can also be implemented.
Updated by XTra.KrazzY almost 16 years ago
- Status changed from Accepted to Work started
Updated by tinctorius almost 16 years ago
Further investigation as suggested by mb2 is blocked by issue 487.
Updated by XTra.KrazzY almost 16 years ago
This issue will be done when I finish Streaming SIMD implementations of the texture
decoders.. It should solve this one.
Updated by memberTwo.mb2 almost 16 years ago
XK, not entirely. I have just found a texture caching bug that invalidate good cached
tlut dependent textures, then have to re-decode them later. This should impact a lot
too, like several FPS I guess. I'm on this one.
Anyway, nice work on texture decoding, keep going :)
Updated by XTra.KrazzY almost 16 years ago
Issue 488 has been merged into this issue.
Updated by memberTwo.mb2 almost 16 years ago
I fixed a texture caching bug.
Can you update and profile. Should change a lot.
Updated by XTra.KrazzY almost 16 years ago
Yeah, good work on that mb2, turns out that our hash function skips bytes... It's
hash collisions with the font textures that made it not work fast, right?
Updated by slink_3_ almost 16 years ago
Safe texture cache (also font fix for MP1 PAL and other games) is broken since r1871 :/
Updated by tinctorius almost 16 years ago
Safe texture cache, including the 'font fix' effect and the apparent lack of JIT <->
tex cache collisions, works on r1877 for me (MP1 PAL).
Updated by tinctorius almost 16 years ago
Also, mb2's fix did indeed seem to change a lot. The menu is still extremely slow,
but much less than before.
Updated by slink_3_ almost 16 years ago
whoops, my bad, it DOES work if i enable it before launching the game.
Enabling it while the game is running is resulting in a green flickering screen :p
nice boost too, i had about 35fps on the menu before, now it's 50fps (PAL full speed) :)
Updated by XTra.KrazzY almost 16 years ago
This seems fixed to me then.
Report your speeds tinctorius.
Updated by slink_3_ almost 16 years ago
at least on windows, it is !
much faster than before :)
Updated by XTra.KrazzY almost 16 years ago
- Status changed from Work started to Fixed
Fixed unless someone else says otherwise
Updated by SlicerSV over 15 years ago
On the 32bit Windows version with really old hardware this never got fixed.
I can't really test Metroid Prime, because I don't own it, but in Tales of Symphonia
GC, which was merged into this, I still slow down to less than half a frame per
second after battles, and during battle if I try to go into the battle menu I slow
down to about 1-2fps per second. In normal conditions I get ~30fps, and in battle I
normally get ~15fps.
My computer specs: AMD Athlon64 3200+ s939, 1gig ram, nVidia Geforce 7950GT
Dolphin settings: D3D9 plugin with the lowest possible settings, OGL Plugin won't run
with my card, Under the general tab: Idle skipping enabled in basic settings, all
settings enabled in Advanced settings, Using D3D9, DSP-HLE, X360pad plugins, sound is
fully enabled.
Updated by XTra.KrazzY over 15 years ago
This doesn't have to do anything with your graphics, it's your processor. Buy a
faster one.
Updated by tinctorius over 15 years ago
I can't concur. I recently built Dolphin and tried MP1 again. The menu runs WAY too
fast (but still smooth), but whenever the main menu pops up, things slow down
horribly and the audio starts to sound choppy.
Updated by XTra.KrazzY over 15 years ago
This has nothing to do with the graphics card but the processing power to decode the
vertices and the textures each frame. Believe me, I've monitored my gfx card's
activity during active emulation.
Please don't bother getting into subjects that you don't fully know.
Updated by SlicerSV over 15 years ago
OK, so I have Windows 7 Beta (64 bit version) that I'm beta testing on the exact same
machine, I run a multi-boot machine with several operating systems for testing
purposes... Anyways, I ran a little app on here for kickers to find out if my card's
supported any better than in Windows XP, seeing as my card is -supposed- to be opengl
2.1.2 capable, and it's got full support listed, only missing one feature and a
handful of post-2.1.2 extensions. So I downloaded 64-bit dolphin, set it up
similarly to under XP, but using opengl plugin instead (btw, I can't seem to get the
d3d one to work under Windows 7 ATM, but I'll figure that out later)
It's definitely -NOT- my hardware, I run as expected, keeping 25-40fps outside
battle, slowing to ~15fps in battle, with NO additional slow-down from going into the
battle menu or from post-battle stats. So it's not my processor, nor is it my
graphics card, something about my 32-bit configuration is just borked, making it so I
can't run the opengl plugin, and the d3d plugin is just not fixed at all...
I'll have to run further tests, for one thing, running on my Linux install, and
secondly, figure out why I can't use the d3d plugin in my Windows 7 install, but I
think it's safe to say that this is fixed in opengl, but NOT fixed in d3d.
... unrelated to dolphin, but I'm going to need to do some nagging over at nvidia to
find out why the Windows XP drivers are missing so many features that they really
-should- have. I should probably also install Windows XP-64 and find out if those
drivers are more akin to the XP ones or the Win7 ones before I go do my nagging.
Updated by YuugiSan almost 15 years ago
i would like to bumb this because the issue came back
i ran mega man X Collection in the menue screen itshows the same thing
its even a 2d game and its just a memmory card screen with black background and 2 words