Emulator Issues #10904
closedMesa/radeonsi hangs during glClientWaitSync()
0%
Description
Flashy click bait title to call your attention!
So... I installed latest Dolphin into my computer. It always freezes in the first 60 seconds into the game.
First of all, this is a race condition so I should post my HW & SW settings because it looks like I'm unlucky to have the combination that triggers this bug, because I literally could not find anything related to this problem.
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB RAM (1 stick)
GPU: Radeon RX 560 Series (POLARIS11 / DRM 3.19.0 / 4.14.11, LLVM 6.0.0)
Mesa 18.1.0-devel (git-183ce5e629)
Xubuntu 17.10
Kernel 4.14.11
I downloaded from source and recompiled Dolphin in full debug to get a better sense of what's going on.
Dolphin freezes inside the GL driver. Callstack:
1 syscall syscall.S 38 0x7f5548e36a49
2 sys_futex futex.h 38 0x7f550450a57c
3 futex_wait futex.h 50 0x7f550450a57c
4 do_futex_fence_wait u_queue.c 114 0x7f550450a57c
5 _util_queue_fence_wait u_queue.c 129 0x7f550450aa79
6 util_queue_fence_wait u_queue.h 160 0x7f55046e637d
7 si_fence_finish si_fence.c 195 0x7f55046e637d
8 st_client_wait_sync st_cb_syncobj.c 115 0x7f550441dbfe
9 client_wait_sync syncobj.c 355 0x7f550438cfe3
10 OGL::StreamBuffer::AllocMemory StreamBuffer.cpp 153 0xca8fd6
11 OGL::BufferStorage::Map StreamBuffer.cpp 277 0xcab35f
12 OGL::StreamBuffer::Map StreamBuffer.h 40 0xc962c2
13 OGL::VertexManager::ResetBuffer VertexManager.cpp 98 0xcb2e58
14 VertexManagerBase::PrepareForAdditionalData VertexManagerBase.cpp 136 0xc5b5d8
15 VertexLoaderManager::RunVertices VertexLoaderManager.cpp 280 0xc4723c
16 OpcodeDecoder::Run<false> OpcodeDecoding.cpp 224 0xc1e103
17 OpcodeDecoder::InterpretDisplayList OpcodeDecoding.cpp 56 0xc1cd81
18 OpcodeDecoder::Run<false> OpcodeDecoding.cpp 180 0xc1deb9
19 Fifo::RunGpuLoop()::$_0::operator()() const Fifo.cpp 351 0xc0caa2
20 Common::BlockingLoop::Run<Fifo::RunGpuLoop()::$_0>(Fifo::RunGpuLoop()::$_0, long) BlockingLoop.h 135 0xc0bc18
It's stalled at:
// if buffer is full
if (m_iterator + size >= m_size)
{
//...
// wait for space at the start
for (int i = 0; i <= Slot(m_iterator + size); i++)
{
glClientWaitSync(m_fences[i], GL_SYNC_FLUSH_COMMANDS_BIT, GL_TIMEOUT_IGNORED); //---> here
glDeleteSync(m_fences[i]);
}
}
So began researching. My suspects were:
- CPU bug. I disabled HyperThreading. Bug still remained.
- AMD_pinned_memory. I disabled it (uses BufferStorage instead).
- Mesa driver bug. I updated to latest. This commit sounds horribly suspicious https://github.com/mesa3d/mesa/commit/50b06cbc10dbca1dfee89b529ba9b564cc4ea6f6 It's possible there's still some dependency going on.
- Dolphin corrupting memory. I could not test this theory.
- glxSwapBuffer behavior. I've seen Mesa act funny when it comes to glxSwapBuffer, and I saw the behavior changed on whether it was rendered in wxWidgets or its own window, so I toggled the option.
- glClientWaitSync being called from multiple threads. Doesn't appear to be the case but I'm not 100% certain.
- glClientWaitSync being used on a dangling fence. I tested this theory, there's no bug about this.
- StreamBuffer leaking too many fences. I tested this theory. IT IS LEAKING. But I've workarounded the issue and still the hangs remain.
- GPU hardware bug. I could not test this theory.
- Some bizarre behavior in StreamBuffer. It is hard to know what's going on (yes, I know it's a ring buffer, but it's complex).
So, I'm attaching a patch to test my findings, which is by no means meant to be included in the repo (it's just debug code, doesn't comply with coding conventions, etc).
By applying the patch I provided, we can see that:
void StreamBuffer::AllocMemory(u32 size)
{
// insert waiting slots for used memory
for (int i = Slot(m_used_iterator); i < Slot(m_iterator); i++)
{
assert( !m_fences_check[i] ); //---> this assert triggers
This means that the code is leaking memory. The constructor populates the array with fences. After mapping->unmapping enough memory, Slot(m_iterator) becomes 1 for the first time, while Slot(m_used_iterator) is still 0. This will cause the snippet of code to create another fence, leaking the old one. This is a leak.
We can also check that after playing for a while, hitting the button stop will trigger yet another one:
for (int i = 0; i < SYNC_POINTS; i++)
assert( !m_fences_check[i] );
This means StreamBuffer::DeleteFences is not deleting all the fences. This is a leak.
But this does not explain my hangs. I workarounded the code (included in patch) so that it doesn't leak (unless I missed something), but it still hangs.
A few things I noticed:
- glClientWaitSync ALWAYS hangs on m_fences[7].
- The snippet before the waits (the one that creats more fences when the buffer is full) has (m_used_iterator >> 21u) == 8. Always.
- Printing the number of times we hit the potentially problematic code doesn't show an exact value. Sometimes it hangs at 49, another time at 52, sometimes at 99.
So I figgured this is probably a driver SW or HW timing bug.
Adding this snippet:
glFlush();
Between the creation of more fences and the waits fixed my hangs.
It fixes the problem. Or at least, I no longer get hangs 60 seconds into the game. Maybe now I'll get one hang per hour? I haven't played long enough to find out.
While this reeks of driver bug, the behavior of StreamBuffer is sketchy, to say the least. Leaks fences, calls glClientWaitSync too frequent (if glClientWaitSync[n+1] returned, then we can assume glClientWaitSync[n] is already signaled!). Honestly I would rewrite the whole class and see if the hangs are gone.
It wouldn't hurt to put some padding in the buffers to check if the buffers have been overflown (by prefilling the boundaries with magic words).
I'm a heavy user glClientWaitSync and I've never experienced hangs in my machine, which is what strickens me odd. I also tried making a stress test to see if I could find a freeze by quickly creating fences and consuming them, sequentially and out of order, but no luck :(
For the moment, my workaround fixes the leaks and hangs, and I can play.
But you may understand now why I chose the title I wrote.
I'm using dolphin from source fca9c28f38e46b78bf65c3b427a21560e5fb42b7
Files