Project

General

Profile

Actions

Emulator Issues #13032

open

Unrecoverable game freezes on Phantasy Star Online

Added by notaloop over 1 year ago. Updated 4 months ago.

Status:
Accepted
Priority:
Normal
Assignee:
-
% Done:

0%

Operating system:
N/A
Issue type:
Bug
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
No
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:

Description

Game Name?

Phantasy Star Online I & II Plus

Game ID? (right click the game in the game list, Properties, Info tab)

GPOE8P

MD5 Hash? (right click the game in the game list, Properties, Verify tab, Verify Integrity button)

36a7f90ad904975b745df9294a06baea

What's the problem? Describe what went wrong.

When playing online the game will have an unrecoverable freeze, requiring a restart. Progress since the last save is lost.

What steps will reproduce the problem?

There isn't a consistent way of triggering them. Generally my frequency of freezing has been daily while keeping up with the latest dev version.

Is the issue present in the latest development version? For future reference, please also write down the version number of the latest development version.

17271 is the latest tested with instability.

Is the issue present in the latest stable version?

Latest stable version is 17245, 20+ hours of gameplay without freezing.

If the issue isn't present in the latest stable version, which is the first broken version? (You can find the first broken version by bisecting. Windows users can use the tool https://forums.dolphin-emu.org/Thread-green-notice-development-thread-unofficial-dolphin-bisection-tool-for-finding-broken-builds and anyone who is building Dolphin on their own can use git bisect.)

Somewhere between 17245 and 17271.

If your issue is a graphical issue, please attach screenshots and record a three frame fifolog of the issue if possible. Screenshots showing what it is supposed to look like from either console or older builds of Dolphin will help too. For more information on how to use the fifoplayer, please check here: https://wiki.dolphin-emu.org/index.php?title=FifoPlayer

[Attach any fifologs if possible, write a description of fifologs and screenshots here to assist people unfamiliar with the game.]

What are your PC specifications? (CPU, GPU, Operating System, more)

Windows 10 Pro x64
Ryzen R5 3600
RTX 3060 Ti
16 GB RAM
512 GB SSD
Using GC adapter and GC controller

Is there anything else that can help developers narrow down the issue? (e.g. logs, screenshots,
configuration files, savefiles, savestates)

These freezes are usually suspected of being caused by network changes. PRs 10985/10920 may have changed network behavior in a way that increases freeze frequency. If back-to-back testing w/logging is possible, that might help confirm these PR are impacting network behavior on PSO.
Alternatively, re-testing 17271 but w/o those two PRs.


Files

GPOE8P_2022-09-29_19-12-50.png (2.2 MB) GPOE8P_2022-09-29_19-12-50.png notaloop, 09/30/2022 02:33 AM
GPOE8P_2022-09-29_19-08-45.png (3.08 MB) GPOE8P_2022-09-29_19-08-45.png notaloop, 09/30/2022 02:33 AM
17253 freeze log.txt (20.2 KB) 17253 freeze log.txt notaloop, 10/01/2022 02:49 PM
17245 log.txt (24.9 KB) 17245 log.txt Dolphin error log notaloop, 10/17/2022 12:28 AM
16838 Crash.txt (14.4 KB) 16838 Crash.txt log file notaloop, 10/26/2022 03:47 AM
Perfect Tapless log.txt (47 KB) Perfect Tapless log.txt notaloop, 10/29/2022 07:30 PM
16290 pre CPU + Patch Rom.txt (12 KB) 16290 pre CPU + Patch Rom.txt minimal detail unhandled exception 3 notaloop, 10/30/2022 03:35 PM
16290 Pre-CPU and Stock ROM.txt (14.8 KB) 16290 Pre-CPU and Stock ROM.txt More detailed unhandled exception 3 notaloop, 10/30/2022 03:35 PM
16788 log.txt (50 KB) 16788 log.txt notaloop, 11/03/2022 05:15 AM
main.cpp (2.35 KB) main.cpp homebrew used to send packets sepalani, 11/15/2023 06:39 PM
00_bba_errors.pcap (9.81 KB) 00_bba_errors.pcap Chunk containing the corrupted packets sepalani, 11/15/2023 06:47 PM
01_bba_partial_fix.pcap (9.83 KB) 01_bba_partial_fix.pcap Splitting the corrupted packets into valid ones sepalani, 11/15/2023 06:51 PM
02_bba_partial_fix_reorder.pcap (9.81 KB) 02_bba_partial_fix_reorder.pcap Moving the 16 corrupted bytes at the end sepalani, 11/15/2023 06:52 PM
03_bba_full_fix.pcap (9.81 KB) 03_bba_full_fix.pcap Replacing the corrupted bytes by the expected data sepalani, 11/15/2023 06:53 PM
Actions #1

Updated by JMC4789 over 1 year ago

If you're the one who can reproduce the issue, bisect it to an exact build. Asking devs who have never played the game to go after a bug that takes an indeterminate amount of time is a good way to get the issue back burnered. If we have an exact build where the issue started, we can potentially provide more information on what to do and ways to debug it.

Actions #2

Updated by pokechu22 over 1 year ago

Also, are you using single core or dual core?

Actions #3

Updated by AdmiralCurtiss over 1 year ago

If you can get a process dump of Dolphin in the frozen state (Task Manager -> Details -> Dolphin.exe -> Create dump file), that is also likely to be helpful.

e: Wait, is the emulated game freezing, or the whole Dolphin process? If it's just the emulated game that might not be too useful...

Actions #4

Updated by JMC4789 over 1 year ago

Based on what I remember from the early HLE BBA testing, this is the emulated game freezing.

Actions #5

Updated by AdmiralCurtiss over 1 year ago

If it's the emulated game freezing a savestate may be useful?

Actions #6

Updated by notaloop over 1 year ago

I can try getting a dump file and save state during a freeze. Bisecting will continue, I narrowed down to ~11 PRs or so.

Actions #7

Updated by JMC4789 over 1 year ago

Great :)

If it is one of the more complicated BBA HLE pull requests, it's possible we can narrow it down to the exact commit through compiling the individual commits of the pull request too.

Actions #8

Updated by notaloop over 1 year ago

Using single core

Actions #9

Updated by notaloop over 1 year ago

Got a freeze on 17260, process dump is here:
https://drive.google.com/file/d/1MCSvBKRD8YJV-vQrG4PT5KmyMBXOaIuj/view?usp=sharing

Was unable to get a save state or FIFO, this crashed the entire emulator.

Will try 17253 next.

Actions #10

Updated by notaloop over 1 year ago

I'll start testing PR 11083 to confirm it fixes the bug. I'm not sure further bisection will be productive.

From my bisection testing:
17245 (20 hrs) and 17253 (27 hrs) appear stable.
17260 (15 hrs) and 17271 (10 hrs) have produced freezes.

The two builds in between 17253 and 17260 don't seem like they changed something that would cause this sporatic freezing issue.

Actions #11

Updated by AdmiralCurtiss over 1 year ago

Well it's impossible to prove that a sporadic bug doesn't exist in any given revision, you can only prove that it does exist. So I'm guessing that 17253 has the bug and you just got lucky, and that the bug was introduced in either 17249 or 17251 (because those builds touch network code).

Actions #12

Updated by notaloop over 1 year ago

I got another freeze, I'd been back on the latest dev build (post-latest BBA fix) (172445). save state and memory dump are here:
https://drive.google.com/file/d/1ducyt2UG0IhNemlFFMux6pzPUOiW7eOk/view?usp=sharing

Actions #13

Updated by sepalani over 1 year ago

If this issue didn't occur on the PRs I touched the network code, it could mean that it was pure luck or that it was already there before. It might be some situational network issues as well.

@notaloop

  1. Do you know if the game produces interesting network log messages or maybe debug OSReport messages?
  2. Do you know if a crash happens when you have network issue?
  3. What happen when you're online and disconnect from your Internet access point (by disconnecting Wi-Fi or removing the Ethernet cable)? Does it freeze/crash?
  4. What happen when you're online, disconnect and reconnect but to a different access point (i.e. switch from ethernet to Wifi, or Wifi to 3g/4g hotspot)?

BTW, I can't access your files from your Google drive link.

Actions #14

Updated by notaloop over 1 year ago

@sepalani I’ve updated sharing on the files. You should have access now.

  1. Where would I look for this? Currently the game basically freezes with no error messages.

  2. I can specifically check this next time. My network is usually reliable . The 17260 crash was on Wi-Fi, 172445 crash on Ethernet. Internet is fiber, 600mbs and I have all modern hardware (<3 years old).

  3. I can try this and get back to you.

  4. I can also try this. Start on Wi-Fi then plug in the Ethernet cable.

Updated by notaloop over 1 year ago

So after some testing, the outcome of 3 and 4 is the same. The game throws a 100 error pretty immediately. The game stops accepting movement/action inputs, but does not freeze. Attached a couple of sample screenshots.

Actions #16

Updated by notaloop over 1 year ago

I managed to get a freeze on 17253, link to save state, log, and memory dump is here:
https://drive.google.com/file/d/15iy0yY6gwO0F3TnNyEKXP351ORCGrxap/view?usp=sharing
I'm also attaching a copy of the log here, for brevity.

Actions #17

Updated by sepalani over 1 year ago

This look likes some kind of memory corruption which results in a crash. Do you know if the bug happens with the build 5.0-17249 and 5.0-17245? I see that you're using a modded game so do you know if such crash can happen on real hardware? Do this bug occur during relatively short sessions (1~2 hours)?

I'm suspecting that my PRs allow more packets to be handled and are increasing the sequence number more quickly. Then a bug I haven't identified is occurring. My assumption might be incorrect but that's the only think I can see that might make the issue occur more frequently.

Actions #18

Updated by notaloop over 1 year ago

I put like 20 hours into 17245 and was not able to get the crash. I can test it some more and report back.

People can also crash on real hardware, but its generally regarded as a lot more stable than using Dolphin. People play on hardmodded gamecubes, or softmodded wii's to patch their iso. Default behavior on improper disconnect or crash is to delete all unequipped items, so people patch that out.

Can't say if its more common on short or long sessions. The 17253 crash I shared above was 35 mins into a session. Its kinda random? Dolphin definitely loads quests (<100 kb) faster than real hardware now, though the crashes tend to happen during a quest, not during quest loading.

Actions #19

Updated by notaloop over 1 year ago

Well that didn't take long.
17245 save state, memory dump, and log here. Log duplicated + attached here. Playtime was around 30 mins into that session.
https://drive.google.com/file/d/1pUSdx5veCL3C2sfjDlvT1q9Kdgq_Hmc4/view?usp=sharing

So at this point I'm not sure what would be the most insightful thing to test to help figure out what's going on. Is there a particular build you want me to try? 16838 was the first one with the BBA HLE added, so that's as far back as I'd be able to test.

Actions #20

Updated by sepalani over 1 year ago

Testing https://dolp.in/v5.0-16838 is a good idea. If that version is working properly then https://dolp.in/v5.0-10871 might be the PR causing it. However, if the original PR still has the issue, you might want to try with the other BBA backend to see if the issue is still happening.

If it still is, then the issue might not be strictly network related. It could be a timing issue or a crash happening on real hardware. Do you happen to notice anything in common for all the crashes you had?

Actions #21

Updated by sepalani over 1 year ago

I meant https://dolp.in/v5.0-16871 not 10871.

Actions #22

Updated by notaloop over 1 year ago

Crashed on 16838, attached link here to crash dump, save state, and error log.
https://drive.google.com/file/d/1eWWlREvShDqksJRTX1SQth7f6-_oQnES/view?usp=sharing

I had started a solo boss rush and was probably <20 mins into that session.

Actions #23

Updated by JMC4789 over 1 year ago

  • Status changed from New to Accepted

Tentatively accepting as this issue has active user + dev.

Actions #24

Updated by notaloop over 1 year ago

Dump from a early build of the BBA adapter, from around May 9th. Usual files (dump, log, savestate) are here:
https://drive.google.com/file/d/1rPmS-06Tjd4QWEmcrjVNrYiFdh-fUBS1/view?usp=share_link
Log attached as well. By the way, this one was using D3D11 as the backend. The other crashes are a mix of Vulkan or D3D12.

I've been given another early build from late April that I'll try next.

Updated by notaloop over 1 year ago

Okay so couple more dumps with an even earlier commit of the BBA tapless adapter. Build is 16290, idk the exact commit but it was labelled as " pre-cup usage lower" on 4/29.

I did something a little different here. Once I got it to crash and got the usual dump files, I also tried using the stock ROM and was able to get get a crash with a lot more detail that what is usual seen on the log. Hopefully that's helpful!

16290 with patched ROM
https://drive.google.com/file/d/1Kx1Fi4bbFOKcB6WGT8KQgeSNrTMeUvi4/view?usp=share_link

16290 with stock ROM (with a lot more detail in the log):
https://drive.google.com/file/d/1jVLUXnSaaXwzhJCAQbrjjZs2poQPXM-A/view?usp=share_link

I'll also attach both sets of logs to this.

PS - I also played for about an hour in offline mode with the stock ROM and was not able to get another crash. The two crashes above happened within a few minutes of playing in online mode.

@sepalani
I see there's network dump settings under the debugging menu. Would logging/dumping be helpful for figuring this out? What settings and build should I use for that purpose?

Actions #27

Updated by sepalani over 1 year ago

Thanks for the additional pieces of information and sorry for the late reply. Do you recall a Dolphin build where you never had this issue happening? Can it occur while using another BBA adapter (using a TAP or Xlink BBA)? A network dump might help if it's network emulation issue, however, it can also be a hardware/EXI emulation issue. AFAICT from the log, the crash is EXI related for sure.

Based on the signature database I have, I was able to find some known symbols (they might be false-positives, though, I doubt it):

Address:      Back Chain    LR Save
0x81295760:   0x812958a8    0x8042aa6c -> ??? (after EXIUnlock)
0x812958a8:   0x81295ba0    0x8043005c -> EXIIntrruptHandler
0x81295ba0:   0x81295bc8    0x80372a00 -> __OSDispatchInterrupt
0x81295bc8:   0x81295bd0    0x8042f600 -> ExiDma (after OSRestoreInterrupts)
0x81295bd0:   0x81295c10    0x8042f5c4 -> ExiDma (after __OSUnmaskInterrupts)
0x81295c10:   0x81295c28    0x8042b424
0x81295c28:   0x81295c30    0x8042b360
0x81295c30:   0x81295c50    0x80425d60
0x81295c50:   0x81295c58    0x803f9ce0
0x81295c58:   0x00000000    0x803755b8

When I'll have some more time, I'll try to investigate it in more details. It's unfortunate that there are no easy way to replicate this issue. Regarding testing, you should definitely use the stock ROM since the logs aren't clobbered and your 16290 build or the original build (https://dolphin-emu.org/download/dev/master/5.0-16838/) should be fine.

Actions #28

Updated by shoegazer 5 months ago

I too have encountered this issue multiple times while playing online using the HLE BBA (haven't tested with TAP or XLink BBA). Within about an hour of gameplay (but often sooner), Dolphin suddenly freezes while emitting a buzzing noise that doesn't stop until you force-close the emulator. One difference in my case: I do not get the in-game message "The line was disconnected" as seen in screenshots above, as the entire emulator freezes. On restart, you re-enter the game to find that all of your character's unequipped items are gone (apparently the freeze somehow corrupts the memory card save). I imagine this can be quite serious if the character has items from thousands of hours of gameplay, though in my case it has only been a few.

@sepalani As @notaloop may have moved on, please let me know if I can assist further. I have not tested any other builds or BBAs against this issue but I can try if you still think it can help. I'm surprised others have not encountered this issue as well, as quite a few people play this game online using Dolphin.

Test system:

  • OS: Linux Mint 21.2 / kernel 6.2
  • CPU: Intel i7-10870H
  • GPU: Nvidia GTX 3070 / driver 535.129
  • Build tested: 5.0-20290
Actions #29

Updated by sepalani 5 months ago

Interesting to see that this issue also occurs on Linux.

@shoegazer
Do you mind testing the TAP and XLink BBA implementations to see if the issue is also occurring with them?

Actions #30

Updated by shoegazer 5 months ago

I hadn't realized this until I read further, but I don't currently have the means to test either of those BBA implementations unfortunately. I'm happy to provide any other information you may need though. If there are any differences with Dolphin logging facilities in Linux that might provide further clues just let me know and I'll get them for you.

Also on a related note for other testers: Dolphin's default IP address for the Schthack server, as found in the HLE BBA settings, is outdated and no longer works. The new address is 3.18.217.27, and is what I've been using in my tests.

Actions #31

Updated by sepalani 5 months ago

Are you able to check if the issue also happens on a real Wii with both Nintendont and Devolution?

Updated by sepalani 5 months ago

Since I don't have the PSO games nor the BBA, I wrote a small GC homebrew (-lbba need to be added in the Makefile) to try to reproduce the issue in the emulator. The IP address/port is hardcoded and need to be changed. I used a netcat like server to receive the data and keep the connection alive. I managed to spot a weird behaviour where the LogBBA function saved invalid packets in the PCAP file (and corrupting it). This homebrew generated a very big PCAP file +4GB and the corrupted packet prevented Wireshark to process it after the corrupted part (I used pcapfix to fix it). I split the 4GB PCAP file into a smaller pcap file so I can share the relevant parts here.

NB: I still need to confirm with a hex editor that pcapfix did its job correctly without altering the packet to a point that it can be misleading.

The 00_bba_errors.pcap file contains the parts where the issue happened (I dunno if that issue is the one making PSO crash). The packet 6 and 7 are the ones we're interested in, the packet 6 begins with 16 (corrupted?) bytes and is followed by several valid TCP packets merged into a single packet. I split the corrupted packets into valid ones in 01_bba_partial_fix.pcap, reordered it in 02_bba_partial_fix_reorder.pcap to address the packet size issue, then noticed that fixing the corrupted 16 bytes is the missing piece to match the packet TCP checksum in 03_bba_full_fix.pcap.

The homebrew is only sending data and never receiving it, the server didn't receive weird data, as far as I could tell. Moreover, the SendFrame method, which is called before logging the packet, has many safeguards to prevent corrupted packets from being processed and they weren't triggered as I can't find them in my Dolphin log. Which means that there is a very high chance for these corrupted packets to have been generated by the BBA HLE implementation and they weren't sent by the game itself. The TCP and IP checksums from these corrupted packets are valid when the packets are restored to their original form which suggests that the packets were most-likely corrupted after being properly generated. Then, they were logged by the RecvHandlePacket method.

In sum, these are my conclusion after analysing the packets. I still need to confirm that my assumptions are correct and if so, figure out where the issue comes from.

Actions #33

Updated by sepalani 5 months ago

I managed to trigger that packet corruption multiple times, afaict, it's a concurrency issue regarding the packet logging and shouldn't impact the emulation: https://github.com/dolphin-emu/dolphin/pull/12304

So it's most likely unrelated to the issue. I did notice some TCP window warnings on Wireshark, I'll have a look at it.

Actions #34

Updated by shoegazer 4 months ago

My apologies for missing this earlier - for some reason I'm not getting notifications of changes to the thread even though it's on in my profile. They're not in junk either. Ah well, I'll try to check more frequently.

My GC hasn't worked in nearly two years now sadly, though perhaps someone else with a BBA-equipped GC can help to confirm.

Interesting packet analysis, though it's too bad it turned out to be a red herring. This is such a tricky issue to catch in the game itself since there's no apparent way to trigger it; and because you can play for many hours without it happening, and then it can happen multiple times within an hour. I hope there's some way to trap the condition outside the game as you've attempted. At least it seems you're getting closer, and hopefully the TCP window warnings provide some clues. Thanks for working through this.

As a side-note, my PR to update the Schthack server IP was merged, so my above post about that is no longer relevant now for anyone testing.

Actions

Also available in: Atom PDF