Emulator Issues #6168

Paper Mario: The Thousand Year Door Glitched Punies

Added by JMC4789 about 8 years ago.

% Done:


Operating system:
Issue type:
Relates to usability:
Relates to performance:
Relates to maintainability:
Regression start:
Fixed in:


I searched for this and found something on the Wiki, but no open issues.

Game: Paper Mario: The Thousand Year Door - G8ME01

The Punies (some weird group that lives in a tree) use some kind of fancy rendering technique like in Mario Party 5's Minigame with the penguins. While that game works with EFB2Ram, these guys don't. While I don't see how their fuzzy (at 1X IR normal resolution, they look fine, but if you do full screen it with higher IR, they are slightly fuzzy, but that's probably not the emulator's fault; they're doing a trick to render tons of them.)

Anyway, instead of rendering normally, every second or two they spazz out and look glitchy. That's all there is to it, nothing special, no hard work involved. On EFB2Texture the effect is even worse, with the punies rendering as complete garbage.

This glitch is one that's been in dolphin forever, I don't have a build that this works on. The Wiki entry alerted me to it, but, honestly, anyone playing the game would easily see that they're broken.

Due to the severe flickering, missing Opcodes, matrix indices not lining up, and other problems, I could not test for sure to see if this glitch happening in software rendering. It did not appear to happen though,meaning D3D9/11 and OpenGL are all afflicted but maybe no software rendering.


EFB2Tex - Constantly broken in this garbage, colors change/warp/etc. as if it's trying to load something.
EFB2Ram - As you can see some are flickered in existence, and some not so lucky


Savefile -
Fifo-log -


#1 Updated by diegojp955 about 8 years ago

Bounding Box option checked?

#2 Updated by JMC4789 about 8 years ago

Bounding box option is checked as per the ini for the game.

#3 Updated by Billiard26 about 8 years ago

  • Issue type set to Bug

#4 Updated by electrokinesis2010 about 8 years ago

well... i checked it out, it seems the software renderer is even worse the punies turn out as small green garbage at the top of the screen.

#5 Updated by MayImilae almost 8 years ago

  • Status changed from New to Accepted
  • Category set to gfx

Reproduced in 3.5-1394. Definitely annoying.

Tested on:
Core i5 3750k @ 4.7ghz
nvidia GTX 275
Windows 7 x64

#6 Updated by gamedevistator almost 8 years ago

Yeah, but i found out that if you use DX9, the glitching will happen less and sometime it will look normal, but when a new scene happens or you go into another part of the tree, they will glitch up again or vice versa.And im using 4x IR so i dont think that the IR has anything to do with it.

#7 Updated by crudelios over 7 years ago

I'm trying to figure out this one, but so far, the bug has been very elusive.

From what I can tell, the efb copy goes well and the proper RAM addresses and texture mappings are used. In fact, if I save the texture to disk immediately after the EFB copy, the images are fine.

However, upon reloading the textures(running TextureCacheBase::Load() and the backend-specific TCacheEntry::Load() or TCacheEntry::Bind()) for use in the game, the textures are resized and get garbled in the process. Saving the textures to a file at this point results in garbled images.

Someone correct me if I'm wrong, but from what I can see, when an EFB copy to RAM is triggered (with EFB copy to RAM active in the config), the requested rectangle (EFBRect) is saved to VRAM (in RBGA8 format, I guess), and then encoded, in the proper GC format, to the texture memory, in RAM.

Upon loading the textures (in TextureCache::Load()), the textures already stored in VRAM are "deleted" and the ones sitting in RAM are decoded back into VRAM and then resized. At this point, the textures in VRAM get garbled.

Therefore, my most recent hunch is that the bug may lie in the texture encoder.

However, since I'm going a bit by trial and error (and getting to know the inner workings of both Dolphin and GC/Wii hardware in the process), I may be completely wrong or missing something obvious.

#8 Updated by JMC4789 over 7 years ago

Considering using a texture pack with the broken frames replaced with custom frames means... well... I don't know if that means anything. Just figured I'd say that in case you didn't know about it.

#9 Updated by crudelios over 7 years ago

Yes I know about that, custom or hires textures do load without any problems. And hires textures are not encoded/decoded to/from ram...

But until I can pinpoint the bug (that is, if I even manage to do that), I can only speculate.

#10 Updated by crudelios over 7 years ago

Upon further investigation I can say that, despite what I was thinking, this does not seem to be an encoding bug. I checked what the textures looked like if they were encoded to ram and decoded back, and they look fine.

However, this bug does not happen in the software renderer. The punies appear fine in that backend.

So I've hit a roadblock.

I have found a few interesting things, though: If you look at the glitched textures, you can see their colors are similar to the ones of the punies. Upon closer inspection I found out they are indeed the punies, but they are distorted.

This distortion seems to be caused by the image not aligning horizontally properly. Since it's hard for me to describe, here are two comparison pictures. The first one is the garbled texture, and the second one is the same texture, after I edited it in gimp and rearranged the pixels:

#11 Updated by NeoBrainX over 7 years ago

By the way, does the current bbox code round to two? The GC/Wii GPU renders in quads and hence I'd expect the bbox values to be multiples of to. So you might try ceiling or rounding your results.

Concerning your issue, maybe you can check the parameters given upon texture loading? I'd guess there's some edge case which software handles correctly but hw backends don't. Maybe you can also try changing the texcache accuracy to Safe (or hacking the texcache to be disabled completely).

#12 Updated by crudelios over 7 years ago

Hacking the cache to be disabled (calling TextureCache::Invalidate() at the beginning of TextureCache::Load()) made half the textures either flicker like mad or completely disappear (no idea why, as they should have been loaded from RAM), but the garbled punies still appeared anyway.

I thought bbox should return an even value on top/left and an odd one on bottom/right. The old bbox code did that, but the new one doesn't. I removed that because the software renderer returns even values on bottom/right as well.

Should it always return an even value for top/left/bottom/right? I can check for that...
Also, the GC/Wii GPU renders in blocks of 2x2 pixels, correct?

#13 Updated by NeoBrainX over 7 years ago

Yeah, the GC/Wii GPU renders in 2x2 pixel blocks (that's what I somewhat sloppily meant with "renders in quads"). And yeah, ignore what I said about "multiple of two" - just checked and the top/left registers should return even numbers (i.e. the top/left of each 2x2 block) and bottom/right should return odd numbers (bottom/right of each 2x2 block). The software renderer should be fixed to do this, too.

Fwiw, instead of calling Invalidate at the beginning of Load(), try replacing the contents of the if block
"TCacheEntryBase *entry = textures[texID];
if (entry) { .... }"
"TCacheEntryBase *entry = textures[texID];
if (entry) {delete entry; entry = NULL; }". Hopefully that gives more reliable results.

#14 Updated by crudelios over 7 years ago

Fixed the even/odd issue in the bbox, but the punies are still glitching...

Oh, before I tried calling Invalidate() I used the code you posted, and the effect was the same. I'll try commenting out half of TextureCache::Load() so it just loads the textures and doesn't check for hashes, hi-res textures, etc, to see if it works any better.

#15 Updated by crudelios over 7 years ago

Finally figured it out, it's a BBox issue after all. Fiddling with the return values eventually makes the punies not glitch. Now I just need to find out what's wrong with the current BBox code...

#16 Updated by JMC4789 over 7 years ago

Thank you so much for your dedication to these issues. I never thought it would be fixed!

#17 Updated by gamedevistator over 7 years ago

So this means that this will fix other graphics issues like the crowd in the glitz pit and the smorgs

#18 Updated by JMC4789 over 7 years ago

Crowd in the Glitz Pit works fine for me in master as it is.

#19 Updated by shellashock over 7 years ago

Great to see! As JMC said, I never thought this would be fixed. Guess we are lucky to have two long standing bugs found/fixed in such a small time frame.

#20 Updated by crudelios over 7 years ago

Thanks for the support!

However, even though I found the cause of the problem, I haven't yet managed to fix it.

I think I'll get it, but the fix is rather complex and processor-intensive. It seems like I'll need to partially simulate the software rendering of the vertexes in order to figure out which pixels should be drawn and which should be discarded. And we all know how fast the software renderer is...

Anyway, I'll try to fix it and check the performance. If it isn't too bad, since BBox is rarely used, I'll probably leave it like that. If it's too slow, eentually I may try to figure out a way to calculate the BBox in hardware, which should be much faster.

#21 Updated by NeoBrainX over 7 years ago

Oh, so you mean only pixels which pass both the alpha and depth tests actually "contribute" to the bbox? That would be fairly horrible :|

#22 Updated by crudelios over 7 years ago

Yeah, I think so!

And to be fully accurate, both scissoring and point and line width calculation would also have to be implemented. At least no known games use BBox with those activated, so there's no need to fully emulate that...

#23 Updated by phire over 7 years ago

Exactly how much of software rendering do you need?

The vertex part of video software is already pretty fast and the tev stuff can be made faster, especially if you only need the result of the alpha/depth tests.

#24 Updated by crudelios over 7 years ago

I think at least the alpha test is needed. As far as I can tell, no games require depth test in BBox emulation, but I may be wrong.

However, for the alpha test to work, I think the texture must be bound to the primitive. Currently the BBox code doesn't do that, and it wasn't designed to do so.

#25 Updated by NeoBrainX over 7 years ago

@ PhireN:
the tev stuff cannot really be made substantially faster. Even if you only care about the result of the alpha channel, you need to perform the proper color calculations for lower stages too since they might be used as input for the alpha combiners.

That said, one might still be able to optimize quite a bit. It's really nontrivial to do correctly then, however.

#26 Updated by phire over 7 years ago

the optimizations for software tev basically involve writing a jit that outputs optimized SSE code for a given tev configuration. Such a jit would be able to extract only the equations needed for alpha through dead code removal.

But yes, non-trivial.

#27 Updated by phire over 7 years ago

As discussed on IRC, the correct solution is probably going to be doing it on the GPU.

Modify each pixel shader to also write into a auxiliary buffer the size of the native EFB representing modified pixels. A bounding box reset will clear the entire buffer. A bounding box read will transfer this buffer back to the CPU where the actual bounding box will be calculated.

Updating the bounding box buffer will be basically free. Only reading will be expensive as it will require the GPU to be synced, but reading the bounding box on native hardware also requires a GPU sync.

#28 Updated by crudelios over 7 years ago

I agree, doing on the GPU is the way to go.

In my tests, I found that when the the bounding box registers are used, they are reset/read read at most three times per frame, and usually only once, so the GPU syncing won't be too heavy on performance.

In the punies chapter, the bounding box is only read when loading the level (and a whopping 112 times at that, though it's done across multiple frames), and not when the punies are being shown. In most other games, the bounding box registers are not even used at all.

So, I don't really think performance will be an issue if the GPU is used.

#29 Updated by JMC4789 over 7 years ago

Paper Mario: The Thousand Year Door is also a very lightweight game comparatively. I don't see a performance loss to make it emulate better causing many problems for people performance wise even.

#30 Updated by phire over 7 years ago

Perhaps the best idea is to correct Bounding Box emulation in the software renderer first.

#31 Updated by crudelios over 7 years ago

I'll do it, but it may take a while.

#32 Updated by crudelios about 7 years ago

I think I managed to fix the punies glitch, and didn't need to implement the alpha pass after all.

Please check the punies with this build (x64 version). Also, if possible check for regressions in other aspects of the bounding box calculation:

#33 Updated by JMC4789 about 7 years ago

Confirmed as fixed using that build. Good stuff. I'll play through the area and see if I see any other problems.

#34 Updated by a41pizza about 7 years ago

Tested and it on OpenGL and D3D. Works beautifully. Good job :)

#35 Updated by gamedevistator about 7 years ago

I downloaded the build but i may be missing something. The punies are fine but some times when i enter another room or the scene changes there glitched again. if i change the Efb from ram to textured, it fixed the glitche. Is the build supposed to be just an exe.

#36 Updated by crudelios about 7 years ago

Yes, the punies do get glitched eventually, at least it's much less so than before.

Still, a full alpha pass implementation is required both to fix the punies and to get past the hang in Super Paper Mario's world 6-1.

I've been working on it, but it's slow and buggy right now. And as my SSD drive decided to die on me, it will still take a little while to get things done.

#37 Updated by gamedevistator about 7 years ago

Does this issue also has to do with glitches when you break something like unlock a secret or when you use flurry to blow and unveil a secret on the wallbecause all you see are rainbow squares.

#38 Updated by JMC4789 about 7 years ago

  • Status changed from Accepted to Work started

I just wanted to note that Punies are still completely flickering on JITIL. I don't think it's related, but figured it should be mentioned. Will likely be a different issue report when this is completely fixed.

Changing to started as it should have been months ago.

#39 Updated by crudelios about 7 years ago

gamedevistator: are you using EFB to RAM?

I got a new SSD and have been fooling around with the full alpha pass implementation. It's still very buggy and very slow, as it means implementing almost all of the rasterization/TEV software. Also, I can't quite understand why it's not working.

I have found a more or less hackish way of getting things to work. It's not 100% perfect, but it is fast, doesn't require 1,000 lines of code and seems to fully fix the punies and Super Paper Mario's hang.

I'll be testing it for regressions and if it works I'll commit it.

#40 Updated by gamedevistator about 7 years ago

Im using EFB to Texture because the punies will glitch less often than with EFB to RAM at least for me.

#41 Updated by NeoBrainX about 7 years ago

crudelios: Just a quick note, chances are that we will not merge any code which tries to reimplement the TEV pipeline in software from scratch instead of re-using the existing software renderer bits (which should have been done for the existing code already).

Also, I'd really appreciate if unit tests for written at for the bbox functionality.

#42 Updated by crudelios about 7 years ago

gamedevistator: You must use efb to ram for effects to work properly.

@NeoBrainX: Reimplementing the TEV is indeed ugly. I can use the software renderer as soon as xfregs and swxfregs are merged, it should be relatively easy to do, albeit very very slow.

As far as I can see there is no way to get bbox values using dirextx/opengl, so the software renderer will always need to be used.
The problem is, fully accurate bbox emulation requires both depth and alpha testing to work, which is very slow.

I have an hackish/innacurate yet fast and "good enough" code right now (no visual glitches as far as I can tell), which only requires using a small part of the rasterizer.

Since I know Dolphin's current goal is accuracy over speed, would a hackish implementation of bbox be merged?

#43 Updated by NeoBrainX about 7 years ago

Such an implementation would likely not be merged*. Hence I was asking for hwtests, then at least we could verify that your code is correct. Also, anyone who would want to come up with a proper implementation would not have to waste his time performing the same tests as you did again.

  • of course, that's just my opinion.

#44 Updated by JMC4789 over 6 years ago

  • Status changed from Work started to Fixed

Also available in: Atom PDF