Emulator Issues #8053
Shader Generation Slowdown/Framedrops/Stuttering
This effects most titles. Notably, in F-Zero GX, Metroid Prime 2/3 and several others, actual defects can be caused by the shader cache stuttering in dualcore mode. Other games just suffer from an inconsistent framerate that can be annoying to users as they play the game.
All modern graphics processors are flexible programmable processors, which use shaders as their application of choice. Dolphin uses shaders to emulate the entire fixed-function pipeline of the Gamecube/Wii GPU.
Updating the state of the GPU is exceedingly cheap on hardware, meaning there can be a lot of shaders being generated in short order.
It takes a reasonable amount of time to generate our shaders (with some maxing out at 30ms according to my tests in F-Zero GX) which guarantee that Dolphin cannot generate the shader in one frame. A second problem is that even when the shaders take less time to generate, there are often 30 - 40 shaders being generated in short order. Even at 1 - 2ms each, that would still lead to stuttering.
In order to partially quell this problem, Dolphin caches the shaders after they are generated. That means on subsequent uses, the shaders don't have to be generated. This does not work on all hardware configurations or drivers, though, and doesn't really solve the problem. Every time Dolphin updates, the cached shaders have to be thrown out currently anyway.
#1 Updated by magumagu9 about 5 years ago
The fact that F-Zero GX etc. crashes is more because our dual-core synchronization is junk rather than anything about the shader cache (essentially, we should pause the CPU thread if the GPU thread is taking longer than expected to execute a draw call, no matter what the cause). That said, slow shader generation certainly makes the issue much easier to reproduce.
#3 Updated by magumagu9 about 5 years ago
One approach to solve this is to compile a more general shader, which depends on fewer parameters. (We can't build a single shader which handles every possible configuration, but we can get closer.) This would mean compiling fewer programs at the expense of increasing the GPU workload. This could be combined with some sort of shader recompilation on a separate thread to reduce the GPU workload.
Another approach is to speed up shader compilation. OpenGL allows linking together GLSL Objects, which could be faster than compiling a whole shader from scratch. D3D11 has a feature called dynamic linking. Probably not enough to solve this completely, but it could help.
Another approach is to build a database of the shader configurations a game needs, and compile them all at startup. This would be easy to implement, but building the database is a logistical nightmare, so it's probably a bad idea. :)
#4 Updated by JMC4789 about 5 years ago
Something Ishiiruka is working on is interesting is making generic shaders that are slower, but pre-generated to fit every single situation of a game. Then they use their asynchronous shader feature to generate the faster, specialized shaders and dynamically change over to them as the game runs.
#6 Updated by ZephyrSurfer about 5 years ago
How exactly would we do recompilation faster? It would still be making it from scratch right, not manipulating the general one on recompilation.
Or is it that the general shader would catch most cases and a minority would have to be recompiled thusly it's faster?
#9 Updated by rodolfoosvaldobogado about 5 years ago
Shader compilation time depends in a lot of factor, API, CPU, GPU and platform.
On windows D3D, that is where I have more experience, the compilation has 3 stages, CODE TO Bytecode using Microsoft HLSL compiler, this generates a driver independent bytecode. the second stage is the first step in the real compilation, once you load the bytecode, is compiled to gpu dependent code but still not optimized, that first code is used immediately to keep rendering and avoid stales, but a third stage starts where the code is recompiled and optimized inside the driver in a different thread, once is finished, it replaces the first un optimized version.
Having this 3 steps the fist stage is the more critical, it can take from 1 millisecond, in really simple shaders, to more than a second (always speaking about dolphin shaders, in reality you could make shader that take more than 10 minutes to compile).
The second stage always take less than 5 ms, and the third stage can take more than 20 ms.
The second and third stage are not controllable from our perspective, they produce increased frame latency but still a tolerable framerate. The first stage is the critical for dolphin.
All the times here are measured with a 3gz AMD cpu as a reference.
#14 Updated by Armada about 5 years ago
Dynamic linking is probably just not enabled in PSTextureEncoder.cpp, but that's just the texture encoding shaders which are very simple shaders which are not cached. The large shaders used for actual rendering are in the shader caches and they are dynamically linked.
#16 Updated by Armada about 5 years ago
rodolfoosvaldobogado, are you sure about that? I've worked with the shaders a lot recently, introducing a geometry shader stage. In the DX11 backend, shaders are cached separately and only linked on runtime. I thought that is what dynamic linking means in the context of shaders?
#18 Updated by rodolfoosvaldobogado about 5 years ago
Dynamic linking in dx11 is a technique to reuse functionality blocks across different shaders avoiding the classic need to recompile every instance of a shader when a small part changes. It is based on the declaration of common interfaces that are used inside the “base shader”. Once the base shader is compiled you get reference to those interfaces (using reflection) and you then set the implementing classes, at runtime. For dolphin to be able to do this, first the different part of the shader needs to be abstracted using interfaces, then every possible combination for the implementation classes need to be coded. That is in my future plans for dolphin but is far from my current schedule as is a really big refactoring in shader generation code.
For reference se here: