Project

General

Profile

Actions

Emulator Issues #4357

closed

SEGFAULT under linux with r7424

Added by beistin about 13 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
JIT
% Done:

0%

Operating system:
N/A
Issue type:
Other
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
No
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:

Description

What's the problem?

From rev7424 dolphin segfault when I start a game.

(optional) Dolphin version that does not have the problem:

Operating system and version: linux Fedora 15

32-bit or 64-bit:64-bit

CPU: intel Q9400 @ 2.66GHz

Game ID (as it appears in game properties, Ex.: "GZ2P01" or "RSBE01"):
All

Build command-line (not on Windows):
cmake -DCMAKE_INSTALL_PREFIX=/opt/DOLPHIN/ -DOPENMP=OFF ..

Was the ISO a plain dump from disc, compressed and/or scrubbed?
All compressed

Please provide any additional information below.
If I switch to interpreter mode, the game starts, but it soooo slow :P
According to the stack trace dolphin crash as soon as it tries to run so generated code by JIT.

Actions #1

Updated by Billiard26 about 13 years ago

  • Status changed from New to Questionable

Fixed in r7429 ?

Actions #2

Updated by beistin about 13 years ago

Nop.

Like I mention it in the title it does not work. (ok I should have written it also in the body of the defect :).

It is certainly another defect triggered by this change, because if I use the interpreter it works.

Actions #3

Updated by beistin about 13 years ago

Here is the stack I get even with the last revision.
Thread [4] 29687 (Suspended : Signal : SIGSEGV:Segmentation fault)
0x40a00699
0x42a00087
0x3
0x3
0x0

Just before the crash, here is the states of the threads
/opt/DOLPHIN/bin/dolphin-emu [29735]
Thread [4] 29855 (Suspended : Step)
Jit64::Run() at Jit.cpp:352 0x592ec0
PowerPC::RunLoop() at PowerPC.cpp:246 0x57c36b
CCPU::Run() at CPU.cpp:59 0x53a0f7
Run() at StdThread.h:242 0x4f7e36
std::thread::RunAndDelete<std::thread::Func<void () at StdThread.h:267 0x4f7e36
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [3] 29766 (Suspended : Container)
nanosleep() at 0x325e8aa87d
usleep() at 0x325e8d97f4
XEventThread() at GLUtil.cpp:310 0x688ff1
Run() at StdThread.h:242 0x4f7e36
std::thread::RunAndDelete<std::thread::Func<void () at StdThread.h:267 0x4f7e36
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [2] 29744 (Suspended : Container)
__lll_unlock_wake() at 0x325f00dd3a
_L_unlock_1112() at 0x325f00a888
pthread_mutex_unlock() at 0x325f00a7df
RunGpuLoop() at Fifo.cpp:140 0x6c0f76
Core::EmuThread() at Core.cpp:418 0x4f79e3
Run() at StdThread.h:242 0x4f7e36
std::thread::RunAndDelete<std::thread::Func<void () at StdThread.h:267 0x4f7e36
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [1] 29735 (Suspended : Container)
poll() at 0x325e8d7283
0x38493d2ce4
0x3260c42374
g_main_loop_run() at 0x3260c42c82
gtk_main() at 0x384554b0b7
wxEventLoop::Run() at 0x38493e74f8
wxAppBase::MainLoop() at 0x384946499b
wxEntry() at 0x326ee972ca
main() at Main.cpp:52 0x4b7e52
T

Actions #4

Updated by hatarumoroboshi about 13 years ago

You said from r7424 but I think you meant r7425...

Actions #5

Updated by beistin about 13 years ago

oups your are right, the lasting working revision is r7424 and the sigfault starts with r7425.

Actions #6

Updated by beistin about 13 years ago

Will reading my last comment, I realised that the stack trace I supplied included one of my change.

So here is a new stack trace from the latest head version

/opt/DOLPHIN/bin/dolphin-emu [13148]
Thread [6] 13187 (Suspended : Signal : SIGSEGV:Segmentation fault)
0x40a00699
0x42a00087
0x3
0x3
0x0
Thread [5] 13186 (Suspended : Container)
sem_wait() at 0x325f00d480
0x7fffd747539e
0x7fffd6ef5e5d
0x7fffd74767c9
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [4] 13185 (Suspended : Container)
sem_wait() at 0x325f00d480
0x7fffd747539e
0x7fffd6ef5e5d
0x7fffd74767c9
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [3] 13178 (Suspended : Container)
nanosleep() at 0x325e8aa87d
usleep() at 0x325e8d97f4
XEventThread() at 0x67ff91
void* std::thread::RunAndDelete<std::thread::Func<void () at 0x4f6906
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [2] 13156 (Suspended : Container)
CommandProcessor::SetCpStatus() at 0x6b3852
RunGpuLoop() at 0x6b7dee
Core::EmuThread() at 0x4f64b3
void* std::thread::RunAndDelete<std::thread::Func<void () at 0x4f6906
start_thread() at 0x325f006ccb
clone() at 0x325e8e0c2d
Thread [1] 13148 (Suspended : Container)
poll() at 0x325e8d7283
0x38493d2ce4
0x3260c42374
g_main_loop_run() at 0x3260c42c82
gtk_main() at 0x384554b0b7
wxEventLoop::Run() at 0x38493e74f8
wxAppBase::MainLoop() at 0x384946499b
wxEntry() at 0x326ee972ca
main() at 0x4b74f2

Actions #7

Updated by tenebrarum about 13 years ago

Just confirming that this still occurs in r7436; with-std=c++0x or without, with gcc 4.6 and with gcc 4.5.2. r7424 definitely works.

C{,XX}FLAGS: -march=core2 -O2 -pipe
binutils: 2.21
cmake: 2.8.4
glibc: 2.13
libsdl: 1.2.14
libtool: 2.4
Linux: 2.6.38.2
make: 3.82
wxGTK: 2.9.1.1

Actions #8

Updated by tenebrarum about 13 years ago

Hm, it crashes with one game (ZTP, GC version, GZ2E01) . . . just tried the only other game I own (ZTP, Wii version, RZDE01 :P) and that doesn't crash.

Actions #9

Updated by beistin about 13 years ago

Ok I found the reason why it crashed.
Starting with rev 7425 the address of asm_routines seems to be really high in memory.
As a consequence a 32bits offset is not enough to call singleStoreQuantized or pairedStoreQuantized etc.

I update to generation of the code to use 2 regs instead of the base reg+offset.
With this patch the game starts again, but a lot off things are missing on the screen.

Well, if anyone that really know the x86 asm could check this patch and do same correction on JItL, I think we could say that this bug is fixed.

Actions #10

Updated by glennricster about 13 years ago

Just out of curiosity, do you have display list caching enabled? If so, does the crash still occur with that not enabled?

Actions #11

Updated by beistin about 13 years ago

No, I did not have display list caching enabled.

The bug is really in the jit code when you have a 64bit system.
Some function are allocated really high in the memory and the generated code can not handle the offset that exceed 32bits.

The good question is why those functions are not allocated in the memory like they used to?

Actions #12

Updated by glennricster about 13 years ago

The reason I ask, is because there are similar issues with the display list caching. Also, because with that option disabled I do not get these crashes with any of the games that I have. I am also on a 64 bit system.

Actions #13

Updated by skidau about 13 years ago

  • Status changed from Questionable to New
  • Issue type set to Other
  • Category set to jit
Actions #14

Updated by skidau about 13 years ago

The patch seems to cause games to black screen in Windows x64. I'm guessing it might be an alignment issue, because I can't see what else could be wrong with the patch.

Actions #15

Updated by beistin about 13 years ago

It don't think that this is caused by the patch. It would rather say that it is another side effect of the "bad memory mapping", that we can could not see before because dolhpin crashed.

Because of the modification of the order of initialization, the memory seems to be allocated much higher in memory and certainly a lot of other access operation based on an 32bits offset do not work properly.

But to see the defect you must have an 64bits OS.

Actions #16

Updated by skidau about 13 years ago

Yeh, it has to be the patch because if I revert it, Starfox Adventures worked. I tested it out a few times to make sure it was not a random occurrence. JITIL continues to work with and without the patch.

Actions #17

Updated by pierre about 13 years ago

The asm_routines.* should always be below the 4G barrier. If they are not, something is awfully wrong, and assumptions about accessing memory break. (Actually, they should be below 2G, but i am not sure we do the right kind of magic to guarantee that.)

Actions #18

Updated by beistin about 13 years ago

With the new init order, the distance between the asm_routines.* and the calling point is above 2G. As the offset is a signed u32 we have to use another access mode to call the asm_routines.* and this is the only modification I made.

I can think only of one thing to explain why the patch is not working with all the game: the additional register I use is not saved/restored properly in one of the function I changed.

Actions #19

Updated by pierre about 13 years ago

The difference is not important, since this is not RIP relative addressing(its RDX relative). So, are your asm_routines.* located between 2G and 4G? If that is the case, what i said about memory accesses may still be problematic, since some of those are RIP relative. Debug-builds will assert if there is a problem with that.

Regarding your patch: you can't use EAX or ECX there -- both used as parameters. Using additional registers is a pretty bad idea for 32bit code generation, since we may have used them all at that point.

Actions #20

Updated by beistin about 13 years ago

Thanks for your advice. As you may have understood, I am not used to x86 asm.
I used to code in 68000 there is a long time ago, and I afraid I lost all my knowledge. This is a good opportunity to learn.

I will reproduce the bug to give you the address, I can't remember exactly but I think it was something like 0x3ffffxxxx so above 2G.

I totally agree with you that this patch do not solve all the problems, because other portion of the code must have memory access issue. This what i was trying to explain when I said there was still some side effect because of the "new" memory layout. But I guess I was not clear enough and when I reread my message I realize how poor my english is ;)

By the way in issue 4397 someone submit a patch that reorder the init order of the function and solve this crash. So you are right when you say that there is certainly an issue in the way the memory is mapped.

Actions #21

Updated by pierre about 13 years ago

So i was playing around with the allocation code, and accidentally reproduced your problem. The problematic memory is not the emitted code. It's the tables holding pointers to the code. So, either we put the tables under the 2G barrier, or we apply my earlier patch. The patch attached here (mis)uses the code blocks to store the tables.

Actions #22

Updated by beistin about 13 years ago

I tried both patchs and they correct the issue.
If all the code is design to work with address under 2G, is it safer to enforce that all the code, table, routine are really under 2G?

Actions #23

Updated by pierre about 13 years ago

  • Status changed from New to Fixed

This issue was closed by revision r7465.

Actions

Also available in: Atom PDF