Emulator Issues #12578
openMetroid Prime Series - Map Rendering Performance Issue Documentation
0%
Description
This is a bit of an archival of research on the Metroid Prime series map slowdown. Namely, when you open up and look at the map, there is a significant performance slowdown that rivals the most demanding points of almost any other game. The main perpetrator of this is Metroid Prime Trilogy, and more specifically Metroid Prime 2 in Metroid Prime Trilogy. But every Metroid Prime game's map is relatively difficult to emulate full speed.
We were using the attached fifolog for testing, as it was a worst case scenario in Metroid Prime 2's Trilogy version. The problem is that the Flipper/Broadway is alternating between rendering filled triangles and lines. Because it is constantly switching what it's doing, it's actually causing a huge batching issue. In this fifolog, there are almost 80K D3D calls and 10.8K draw calls.
There aren't many options to fix this either. Reordering will break rendering because it's drawing with blending on. Potentially, we could transform the vertices in software, but that would require going through the whole T&L process. Not only would this be difficult, but it would also be fairly slow. Stenzek's intuition says that it would be faster than the 80K D3D calls though.The most realistic solution would likely be a Twilight Princess style "patch" to reduce the draw call overhead while breaking rendering in a way that doesn't impact gameplay.
Files
Updated by JMC4789 over 3 years ago
- Status changed from New to Accepted
This is known to be demanding, so accepting it. We may need a TP hack style solution, though.
Updated by JosJuice over 3 years ago
- Relates to performance changed from No to Yes
Updated by eureka about 3 years ago
While playing through Metroid Prime 1 on Android recently, I noticed that the map performance with Vulkan was quite good compared to OpenGL (consistently 30FPS+ with Vulkan, 8-10FPS at best with OpenGL). With OpenGL, the map is especially slow in the Chozo Ruins and Phendrana Drifts regions, but renders at a usable speed in other regions.
Updated by JosJuice about 3 years ago
That could probably be because your device doesn't support geometry shaders in Vulkan. Without geometry shaders, it runs a lot faster but the lines are too thin.
Updated by pokechu22 about 2 years ago
I somewhat understand why the game does this. For most of the lines, it's changing one of the constant (or "konstant") colors the TEV has access to, but nothing else. The function that normally does this, GX_SetTevKColor
, writes red + alpha, then blue + green, then blue + green again, then blue + green a third time. Libogc says "this two calls should obviously flush the Write Gather Pipe", but what seems important to me is that each write is 5 bytes of data each (0x61 to indicate it's a BP command, the command number (0xe1), and then 3 bytes of data), so the extra 2 blue + green writes are 10 bytes. (The write gather pipe stores groups of 32 bytes, so I don't think this is actually a flush, but it's doing something to work around a hardware quirk.) Metroid Prime's triangles end up being 11 bytes: 0x98 to start a triangle strip, and then 00 08 for the length (8 vertices), and then 00 00 00 00 00 00 00 00 for the actual vertices (which is just the same position 8 times.) So this is probably just a way of working around the same issue.
However, they also send though similar triangles after changing only the line-width configuration, which normally doesn't have a similar flush. I don't know why that might be.
As far as I can tell, the value 0 here doesn't have any special meaning: it really is just referring to index 0, which may be on screen, so culling off-screen triangles (like in PR 11208 won't necessarily help. We could skip degenerate triangles, but I don't know how much of a gain that would be, as changing the point size or line size still would break batching to my understanding.
Actually, I can verify this by skipping triangle strips entirely (by adding if (primitive == OpcodeDecoder::Primitive::GX_DRAW_TRIANGLE_STRIP) return size
to VertexLoaderManager::RunVertices
); this obviously breaks things, but would be faster than the best case that could be achieved by skipping unwanted triangle strips (since it just skips them all). And that does seem to be the case in practice: performance does increase a fair bit, but there are still 4451 draw calls. I'm not entirely sure how to count D3D calls, but in renderdoc with D3D11, there are 67K EIDs with this change, and 167K without, and in Vulkan, there are 20K EIDs with this change, and 56K without. (I'm not really sure what to make of these numbers, since of course it is skipping a lot of actually important rendering.)
It does seem to be the case that a lot of stuff that's getting drawn (both lines and triangles) is off screen, though, so culling may still help in that way.