Emulator Issues #6936

PPC_FP Merge Master Issue

Added by JMC4789 over 7 years ago.

% Done:


Operating system:
Issue type:
Relates to usability:
Relates to performance:
Relates to maintainability:
Regression start:
Fixed in:


This issue is mainly to keep track of all of the things broken by the PPC_FP merge. Phire and I both thought this would be a good idea to have for when it (inevitably) gets fixed.

Related issues

Blocked by Emulator - Emulator Issues #6824: TMNT reboots when trying to load a save gameFixed

Blocked by Emulator - Emulator Issues #6950: WWE 13 collision bugFixed

Blocked by Emulator - Emulator Issues #6968: Goblin Commander: Unleash the Horde Models AskewFixed

Blocked by Emulator - Emulator Issues #6975: Zelda SS crash after defeating the imprisoned for the third time...Fixed


#1 Updated by MayImilae over 7 years ago

  • Status changed from New to Accepted
  • Milestone set to Current
  • Priority set to High
  • Category set to ppc

#2 Updated by a41pizza over 7 years ago

Sonic Colors has these white graphics defects that show up on D3D and OpenGL (can't see any menus on OpenGL because it is broken with this:

#3 Updated by JMC4789 over 7 years ago


#4 Updated by JMC4789 over 7 years ago

Err, that's a separate issue. Whoops. Lrn2read, I'm up too late, blah blah, excuses

#6 Updated by JMC4789 over 7 years ago

Thank you, I almost forgot to add it.

#7 Updated by a41pizza over 7 years ago

The no-menu bug for Sonic Colors (and the other Sonic games) onOpenGL is a separate issue, yes, but I was just giving a heads up so that you don't think it's a part of the ppc_merge break

#8 Updated by JMC4789 over 7 years ago

Yeah, the lrn2read was for me, you obviously didn't say it was, I was just blindly reading issue numbers.

#10 Updated by phire over 7 years ago

Ok the problem is the lfs (load float single) and stfs (store float single) instructions.

The PowerPC(gecko) only has double precision registers, so it converts singles to doubles when they are loaded into FP register file. But singles don't gain any extra precision, the single point operations still only use around 32 bits of the register (This is not entirely true, there is some special handling for denormal singles)

Dolphin emulates this behaviour by converting all singles to doubles using the CVTSS2SD instruction, then after each single precision op (actually implemented as a double) it executes a pair of CVTSD2SS/CVTSS2SD instructions to keep the precision of float in the double register at only single point.

This gives dolphin and gecko mostly the same behaviour.

The problem arises when the "denormal as zero" flag is enabled.
Because single <--> double conversions happen automatically on the gecko when singles are loaded in and out of memory, there is no need for explicit conversion instructions, while dolphin on the x86 has extra explicit single <--> double conversion instructions all over the place, including the lfs and stsf instructions.

Because the x86's conversions are explicit, they count as operations and denormals are changed to zero, while gecko's conversions are implicit and don't trigger a denormal as zero conversion.

This means on the gecko you can safely copy a 32 bit value from one place in memory to another with just a lfs/stfs pair while dolphin with it's new daz code will mutilate these values.

#11 Updated by phire over 7 years ago

But why does JitIL still work?

JitIL has an optimisation which folds duplicate IL opcodes into each other and this optimisation (correctly?) assumes that a SingleToDouble followed by a DoubleToSingle is a nop and optimises it out.

So JitIL does the correct thing when a lfs/stfs pair are inside the same jit block however the mutilated SingleToDouble value is still calculated and written to the register file.

If a game does this trick over multiple jit blocks, JitIL will still generate an incorrect result.

#12 Updated by JMC4789 over 7 years ago

Well, that explains that? I think? Either way, good job on tracking it down. Will the PPC_FP merge work if this is fixed?

#13 Updated by degasus over 7 years ago

phire: imo we have to check what happens on real hw when we load a denormalized single and use it as double. Does it normalize them on the convertion, also in non-ieee mode?

Maybe there is a fast way to correctly load a single without ctz (but a branch if ieee-mode is enabled)

#14 Updated by phire over 7 years ago

  • Status changed from Accepted to Work started

degasus: The docs say that singles are normalized into the double registers and denormalized on the way back out. The docs (PowerPCProgEnv.pdf) also list the exact algorithm used.

flacs was suggesting that we use the x87 hardware to do float to double conversions, as it is just sitting around doing nothing.

There is also the possibility that x86 processors will correctly pipeline changes to the mxcsr register. I've seen no evidence to the contrary, but I'd want to do a test program/microbenchmark before going down that path.

#15 Updated by JMC4789 over 7 years ago

This looks like more fallout of the PPC_FP merge, this time in the revert.

#16 Updated by flacs over 7 years ago

Here is a minimal failing test case that passes on real hardware:

#17 Updated by degasus over 7 years ago

#18 Updated by JMC4789 about 7 years ago

This should all be fixed by 4311caef094c434cea6770f8230a7c39318a64def. I can't confirm all the issues unfortunately.

#19 Updated by flacs about 7 years ago

  • Status changed from Work started to Fixed

The branch with the fixes was merged in 311caef094c434cea6770f8230a7c39318a64def. I tested most of the issues and they're gone now. Let's close this for now.

Also available in: Atom PDF