Project

General

Profile

Actions

Emulator Issues #10821

open

The Daring Game for Girls Hang

Added by kolano about 6 years ago. Updated over 3 years ago.

Status:
Accepted
Priority:
Normal
Assignee:
-
% Done:

0%

Operating system:
N/A
Issue type:
Bug
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
No
Easy:
No
Relates to maintainability:
No
Regression start:
Fixed in:

Description

Game Name?

The Daring Game for Girls

Game ID? (right click the game in the game list, properties, info tab)

SDAE5G

What's the problem? Describe what went wrong.

Two invalid read errors occur after exiting the Wii Remote safety warning and the game hangs on a white screen.

What steps will reproduce the problem?

Start the title

Is the issue present in the latest development version? For future reference, please also write down the version number of the latest development version.

5.0-6004

Is the issue present in the latest stable version?

Yes

What are your PC specifications? (CPU, GPU, Operating System, more)

i7-6700K, GeForce 970, Windows 10

Actions #1

Updated by kolano about 6 years ago

Tested with OpenGL, also occurs in 5.0-6208 /w DirectX

Actions #2

Updated by flacs over 5 years ago

You can get past the error by temporarily overriding the CPU clock (I only tried 6%) during startup. After that you can disable the override and the game works fine.

Actions #3

Updated by flacs over 5 years ago

The error is a null pointer dereference nested two function calls inside the __AID_Callback, so I'm guessing the crash is related to CPU<>DSP timing.

Actions #4

Updated by hthh almost 4 years ago

Yeah, looks like a CPU<>DSP timing issue. An AX callback is registered, but it crashes if it's called before a global variable is set.

The game behaviour is as follows:

void crashing_callback()
{
  g_sound_system->vtable->func_C(g_sound_system);
}

XSoundSystemWii *XSoundSystemWii::XSoundSystemWii(XSoundSystemWii *this)
{
  int sound_mode;

  XSoundSystem::XSoundSystem(this);
  this->vtable = &XSoundSystemWii::vtable;
  this->dword38 = 0;
  this->dword3C = 0;
  this->dword40 = 0;
  this->dword44 = 0;
  this->dword48 = 0;
  this->dword4C = 0;
  AIInit(0);
  AXInit();
  AXRegisterCallback(crashing_callback);
  MIXInit();
  sound_mode = SCGetSoundMode();
  if ( sound_mode )
  {
    if ( sound_mode == 1 )
    {
      AXSetMode(0);
      MIXSetSoundMode(1);
    }
    else if ( sound_mode == 2 )
    {
      AXSetMode(1);
      MIXSetSoundMode(2);
    }
  }
  else
  {
    AXSetMode(0);
    MIXSetSoundMode(0);
  }
  g_sound_system = this;
  // ...
}

The relevant timing code in DSP.cpp is as follows:

// TODO: need hardware tests for the timing of this interrupt.
// Sky Crawlers crashes at boot if this is scheduled less than 87 cycles in the future.
// Other Namco games crash too, see issue 9509. For now we will just push it to 200 cycles
CoreTiming::ScheduleEvent(200, s_et_GenerateDSPInterrupt, INT_AID);

Upping the delay to 8169 ticks or more works in the JIT. I didn't manage to test interpreter, so I'd maybe round that up to 8200 to be safe. Dropping the delay to 21 or lower also works (as the interrupt runs before the callback is registered), although that seems unlikely to be correct. In this case the DMA size is 12 blocks. As the comment notes, this needs hardware testing. (Values tested on 5.0-12245)

Actions #5

Updated by hthh almost 4 years ago

Rough timings are in, and it's not going to be as easy to fix as I'd hoped... I'm just using libogc + wiiload and PMC3 to count cycles (messy code at https://gist.github.com/hthh/ad17a78d95abd5cdac21954b3b4486f4 )

Dolphin (deterministic):

size 180: cycles = 256, i = 47

Wii:

size 180: cycles = 949, i = 4
size 180: cycles = 498, i = 29
size 180: cycles = 494, i = 28
size 180: cycles = 503, i = 28
size 180: cycles = 517, i = 29
size 180: cycles = 500, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 497, i = 29
size 180: cycles = 494, i = 28

size 20: cycles = 889, i = 3
size 20: cycles = 511, i = 3
size 40: cycles = 500, i = 29
size 40: cycles = 509, i = 28
size 60: cycles = 509, i = 25
size 60: cycles = 500, i = 30
size 80: cycles = 497, i = 29
size 80: cycles = 494, i = 28
size a0: cycles = 497, i = 30
size a0: cycles = 497, i = 30
size c0: cycles = 497, i = 30
size c0: cycles = 497, i = 30
size e0: cycles = 494, i = 28
size e0: cycles = 497, i = 30
size 100: cycles = 497, i = 30
size 100: cycles = 494, i = 28
size 120: cycles = 494, i = 28
size 120: cycles = 497, i = 30
size 140: cycles = 494, i = 28
size 140: cycles = 497, i = 30
size 160: cycles = 494, i = 28
size 160: cycles = 494, i = 28
size 180: cycles = 494, i = 28
size 180: cycles = 497, i = 30
size 1a0: cycles = 497, i = 30
size 1a0: cycles = 497, i = 30
size 1c0: cycles = 497, i = 30
size 1c0: cycles = 494, i = 28
size 1e0: cycles = 494, i = 28
size 1e0: cycles = 494, i = 28

The high first result in each run is presumably related to cold cache? My methodology isn't too precise, but I'd guess we could up the ticks to 400 and be strictly more accurate. Even when I accidentally included too much libogc overhead in the timing, the biggest cold-cache result I saw was under 2000, so I think increasing the delay to anything like 8000 is the wrong answer.

The "i" number is produced by a loop of "while (!done) i++;" that waits for the callback to be invoked. It shows that real hardware executes fewer iterations even in around twice as many cycles, which is a bit surprising.

I'm a fair bit more confused than when I began... If we imagine the 400 cycles is exact, we're exactly in the middle of the numbers that work (logarithmically). Make the CPU 20x slower and we'd run before the callback is registered, or make the CPU 20x faster and it'd initialise the global before we run. The latter seems impossible, and the former merely improbable.

Could it be cache? Looking at the 32 instructions in the window, the only likely miss would be the call to AXRegisterCallback. Everything else should have been in cache recently, (although we do cross a cache-line with linear execution in AXStartDMA).

Could it be unexpectedly slow instructions? The final mfmsr in OSDisableInterrupts might be a bit slower than we account for? There's three BLRs and two BLs I guess?

Maybe these 32 instructions run just slow enough that another interrupt always lands in that window, allowing the 400 cycles to pass before the interrupt we care about?

Maybe some other DSP related configuration I've overlooked changes the speed of the interrupt?

Maybe there's a mistake in my test and 400 cycles is completely wrong?

Anyway, sorry about the ramble - I really hoped this'd be easy after hardware testing.

Actions #6

Updated by JMC4789 over 3 years ago

  • Status changed from New to Accepted
Actions

Also available in: Atom PDF