Project

General

Profile

Actions

Emulator Issues #13144

closed

RS2 crashes randomly

Added by pokechu22 almost 2 years ago. Updated over 1 year ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
% Done:

0%

Operating system:
N/A
Issue type:
Bug
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
No
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:
5.0-19652

Description

Game Name?

Star Wars Rogue Squadron II: Rogue Leader

Game ID?

GSWE64

MD5 Hash?

0bf391bee90da09d6042016d23e7a9b1

What's the problem? Describe what went wrong.

The game randomly crashes. More precisely, it seems to crash during transitions, e.g. from the stormtroopers making the lucas art logo to the main menu, or from the attract mode cutscene to the main menu.

What steps will reproduce the problem?

Fortunately, I was able to make an input recording that consistently triggers the problem. All the input recording does is skip the logo cutscene, then idle on the title screen until the attract mode cutscene plays, then presses a button to return to the title screen; it then crashes. Note that save data is required to skip the logo cutscene; I've included my save file (as a raw memcard) along with the movie, but probably any save file will sync (I have not tested this). The movie was recorded in single core with MMU enabled, but these settings should automatically apply.

Is the issue present in the latest development version? For future reference, please also write down the version number of the latest development version.

Tested both 5.0-17995 and 5.0-18186.

Is the issue present in the latest stable version?

Not tested. I don't think my movie would sync, so I don't want to test this right now.

What are your PC specifications?

  • CPU: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz, 2592 Mhz, 6 Core(s), 12 Logical Processor(s)
  • GPU: NVIDIA GeForce GTX 1650 Ti
  • Operating System: Windows 10.0.19044.2364

Is there anything else that can help developers narrow down the issue?

Here's the OSReport output for the crash:

41:16:365 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: Unhandled Exception 2
41:16:366 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ------------------------- Context 0x8008f388 -------------------------
41:16:366 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r0   = 0x00000000 (             0)  r16  = 0x00000000 (             0)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r1   = 0x800ac050 (   -2146779056)  r17  = 0x7fde8254 (    2145288788)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r2   = 0x8009ea20 (   -2146833888)  r18  = 0x7fde8230 (    2145288752)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r3   = 0x00000001 (             1)  r19  = 0x7fde8270 (    2145288816)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r4   = 0x8140d38c (   -2126457972)  r20  = 0x7fde8271 (    2145288817)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r5   = 0x00000000 (             0)  r21  = 0x7fde825c (    2145288796)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r6   = 0x00000128 (           296)  r22  = 0x7fde8258 (    2145288792)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r7   = 0x804873c0 (   -2142735424)  r23  = 0x7fde822c (    2145288748)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r8   = 0x7fde9ad0 (    2145295056)  r24  = 0x7fde821c (    2145288732)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r9   = 0x00000000 (             0)  r25  = 0x7fde8219 (    2145288729)
41:16:367 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r10  = 0x00000000 (             0)  r26  = 0x7fdcdb40 (    2145180480)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r11  = 0x00000000 (             0)  r27  = 0x7fde8000 (    2145288192)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r12  = 0x00000000 (             0)  r28  = 0x00010000 (         65536)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r13  = 0x8009ada0 (   -2146849376)  r29  = 0x00000001 (             1)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r14  = 0x00000000 (             0)  r30  = 0x00000000 (             0)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: r15  = 0x00000000 (             0)  r31  = 0x8140d260 (   -2126458272)
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: LR   = 0x7fd8bf58                   CR   = 0x44900244
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: SRR0 = 0x7fd8bf84                   SRR1 = 0x0000b032
41:16:368 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: GQRs----------
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: gqr0 = 0x00000000 	 gqr4 = 0x00060006
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: gqr1 = 0x00000000 	 gqr5 = 0x00070007
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: gqr2 = 0x00040004 	 gqr6 = 0x3d043d04
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: gqr3 = 0x00050005 	 gqr7 = 0x00000e05
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: FPRs----------
41:16:369 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr0 	= 0 	 fr1 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr2 	= 4088 	 fr3 	= 6636
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr4 	= 273 	 fr5 	= 2
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr6 	= 0 	 fr7 	= 32
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr8 	= 0 	 fr9 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr10 	= 3970 	 fr11 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr12 	= 3320 	 fr13 	= 1
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr14 	= 0 	 fr15 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr16 	= 0 	 fr17 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr18 	= 0 	 fr19 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr20 	= 0 	 fr21 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr22 	= 0 	 fr23 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr24 	= 0 	 fr25 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr26 	= 0 	 fr27 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr28 	= 0 	 fr29 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: fr30 	= 0 	 fr31 	= 0
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: PSFs----------
41:16:370 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps0 	= 0x0 	 ps1 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps2 	= 0xffffffff 	 ps3 	= 0xffffffff
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps4 	= 0xffffffff 	 ps5 	= 0x2
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps6 	= 0x0 	 ps7 	= 0xffffffff
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps8 	= 0x0 	 ps9 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps10 	= 0xffffffff 	 ps11 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps12 	= 0xffffffff 	 ps13 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps14 	= 0x0 	 ps15 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps16 	= 0x0 	 ps17 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps18 	= 0x0 	 ps19 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps20 	= 0x0 	 ps21 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps22 	= 0x0 	 ps23 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps24 	= 0x0 	 ps25 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps26 	= 0x0 	 ps27 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps28 	= 0x0 	 ps29 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: ps30 	= 0x0 	 ps31 	= 0x0
41:16:371 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: Address:      Back Chain    LR Save
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac050:   0x800ac0c8    0x7fd8bf58
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac0c8:   0x800ac120    0x7fd8c980
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac120:   0x800ac130    0x7fc02bf0
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac130:   0x800ac178    0x7fc003a8
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac178:   0x800ac1b0    0x7fc002a8
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac1b0:   0x800ac1d0    0x80006ac8
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 0x800ac1d0:   0xffffffff    0x800031ec
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:372 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: DSISR= 0x42000000                   DAR  = 0xffff8119
41:16:373 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: 
41:16:373 Core\HW\EXI\EXI_DeviceIPL.cpp:305 N[OSREPORT]: Instruction at 0x7fd8bf84 (read from SRR0) attempted to access invalid address 0xffff8119 (read from DAR)

The crash does not happen if a breakpoint is set at 0x7fd8bf84 (a stfsu f1, -0x7EE8 (r3) instruction). r3 is 0x7fe00000, so the written address should be 0x7fdf8118. That breakpoint will be hit 3 times: once when the attract mode cutscene starts, and twice when exiting it (the first time, nothing is mapped at 0x7fdf8118, the second time memory is mapped to it). r3 isn't 0x7fe00000 in the crash dump, so I think it's getting corrupted somehow?


Files

RS2Crash.7z (24.3 KB) RS2Crash.7z pokechu22, 01/09/2023 07:54 PM
Actions #1

Updated by pokechu22 almost 2 years ago

  • Relates to usability changed from No to Yes

OK, I've got a basic idea of what's happening based on some basic investigation:

  • The issue does not happen if I turn JIT LoadStore Floating Point off. This makes sense since the exception is happening at a stfsu instruction.
  • The issue does not happen if I have a breakpoint anywhere from 7fd8bf84 to 7fd8bf74.
  • The issue still happens if fastmem is disabled.
  • The issue still happens if a breakpoint is at 7fd8bf70 or earlier.
  • If I have a memory breakpoint at 0x7fdf8118, it gets hit when memory is not mapped to that address, but then does not get hit the second time (a MBP at 0xffff8119 does get hit the second time).
  • If I replace the stfsu instruction at 7fd8bf84 (encoded d4238118) with an stwu instruction (encoded 94638118) while at a breakpoint, behavior changes slightly:
    ** If the breakpoint is at 7fd8bf74, the game proceeds normally (at least to the title screen; presumably storing the wrong value has some negative side effect)
    ** If the breakpoint is at 7fd8bf70, then Dolphin crashes
  • If I replace the stfsu instruction with an stfs instruction (encoded d0238118), and also replace the later two stfs f1, 4(r3) and stfs f1, 8(r3) instructions with stfs f1, -0x7EE4(r3) (d023811c) and stfs f1, -0x7EE0(r3) (d0238120), things behave properly regardless of the location of the breakpoint.

Since things still break with stwu instead of stfsu, this is probably not a float-specific bug (though the dolphin crash instead of a MMU crash is worrying).

The different breakpoint placements probably matter because 7fd8bf74 is an lis r3, 0x7FE0 instruction. Between 7fd8bf74 and 7fd8bf84 nothing uses r3.

Specifically, lis r3, 0x7FE0 sets r3 to 0x7FE00000, and then stfsu f1, -0x7EE8(r3) _st_‍ores the _f_‍loating-point _s_‍ingle f1 to r3 - 0x7EE8 and then _u_‍pdates r3 to r3 - 0x7EE8. Since PowerPC lacks a way to load a 32-bit value in a single instruction, this kind of thing is common, although you'll often see addi instead of using an updating store. But this is the simplest way to store a value to a fixed address (in this case 0x7FE00000 - 0x7EE8 = 0x7fdf8118). The later two store instructions use offsets from that, as presumably 0x7fdf8118 is a 3-float array and they're storing to array[0], array[1], and array[2], but it's entirely equivalent to use combined offsets directly instead of using stfsu to compute the start of the array and then use offsets from that.

My guess as to what's going on based on looking at PPC vs host view depending on where I've put a breakpoint is that if both the lis and stfsu instructions are in the same JIT block, things break. Because x64 does have the ability to load a 32-bit value in a single instruction, Dolphin tries to detect those 2-instruction calculations and merge them into one. If I put a breakpoint at 7fd8bf74, a new JIT block is forced to start at 7fd8bf78, which does not include the lis instruction. I think this is the code that's generated for the stfsu instruction in that case:

mov r14d, dword ptr ss:[rbp-92]
movaps xmm0, xmm6
call .+114471634 (0x000003f05d165749)
lea edx, dword ptr ds:[r14-32488]
jmp .+81229229 (0x000003f05b1b1a3c)
mov r14d, edx
movaps dqword ptr ss:[rbp+128], xmm6

(unfortunately I don't know how to see what's generated for lis in this case, as the JIT block containing 7fd8bf74 actually still covers 7fd8bf58-7fd8bfd4 and more even with the breakpoint, so I can't easily move the breakpoint to see what changed.)

If the breakpoint is at 7fd8bf70, a new JIT block starts at 7fd8bf74, which includes both the lis instruction and the stfsu instruction. Here's what's generated instead:

movaps xmm0, xmm6
call .+114471626 (0x000003f05d165751)
mov dword ptr ss:[rbp-128], 0x7fd8bf84
sub rsp, 0x0000000000000020
mov ecx, eax
mov edx, 0x7fdf8118
mov rax, 0x00007ff6c8c45440
call rax
add rsp, 0x0000000000000020
test dword ptr ss:[rbp+96], 0x00000008
jnz .+35373698 (0x000003f0585f6766)
mov r14d, 0x7fdf8118
movaps dqword ptr ss:[rbp+128], xmm6

0x00007ff6c8c45440 is (based on copying that address from dolphin, pasting it into notepad++, copying from notepad++, and then pasting (void*)0x00007ff6c8c45440 into visual studio's immediate window - pasting directly causes a hang for some reason) Dolphin.exe!PowerPC::Write_U32(unsigned int, unsigned int). So in this case the JIT generated a call to PowerPC::Write_U32(f1, 0x7fdf8118), and then after that call directly sets r14d to 0x7fdf8118. The JIT doesn't generate similar calls for the two addresses after, even though it theoretically could (would that be faster, though? I feel like generating calls like this is slower than using fastmem, but I don't fully understand how that works).

Unfortunately this approach breaks if Write_U32 raises an exception, since the intermediate value of 0x7FE00000 never got written to r3. Then, when the game leaves the exception handler, it re-runs the stfsu instruction on its own, but r3 has the wrong value. Basically, it looks like it's eliminating the lis instruction, which would be fine if stfus didn't raise an exception, but causes problems when it does.

Actions #2

Updated by pokechu22 almost 2 years ago

  • Relates to usability changed from Yes to No
Actions #3

Updated by pokechu22 almost 2 years ago

This same issue also applies to memory breakpoints, which makes it a bit easier to reproduce. Here's a set of examples in Super Mario Sunshine (which has debug symbols).

Function Symbol Instruction Memory Symbol
80028e1c Camera::__sinit_cameralib_cpp stfsu f1, -0x298(r3) 803efd68 Camera::CLBConstUpVec
800230fc Camera::CPolarSubCamera::perform lfsu f0, -0x298(r3) 803efd68 Camera::CLBConstUpVec
8031dd9c JASystem::TrackMgr::initRegistTrack stwu r0, -0x1fc8(r3) 803fe038 JASystem::TrackMgr::sTrackList
802e0808 JSystem::J3DSys::setTexCacheRegion lwzu r3, -0x6050(r5) 803a9fb0 JSystem::@838[2]

All of these are hit fairly early on (either while loading, or on the title screen). Enable boot to pause; then load the game; then set an instruction breakpoint at the function's address; then unpause; then when the breakpoint is hit, set a read/write memory breakpoint at the memory address; then resume again. Interestingly, the load instructions behave just fine; it's just the store instructions that break. You can see the incorrect value in r3 in the registers widget when the memory breakpoint is hit; it ends up containing garbage.

The same thing probably also affects psq_stu (and maybe psq_lu), but I couldn't find any examples in Super Mario Sunshine where those use a constant address.

My PR fixes both the stfsu and stwu cases here.

Actions #4

Updated by JosJuice over 1 year ago

The PR was merged in 5.0-19652. Can you confirm that this fixes the issue? Or were you able to reproduce the issue using lfsu/lwzu?

Actions #5

Updated by pokechu22 over 1 year ago

  • Status changed from New to Fixed
  • Fixed in set to 5.0-19652

I can confirm that the crash reproduced via my movie is fixed in 5.0-19652 (and still present in 5.0-19649). Note that 5.0-19654 broke booting RS2 (with an invalid instruction 00000000), but still, everything is fine in 5.0-19652 which is what this issue is about.

The movie only syncs with "skip main menu" enabled (and oddly it seems like that setting isn't saved in the movie).

Actions

Also available in: Atom PDF