Emulator Issues #6936
closedPPC_FP Merge Master Issue
0%
Description
This issue is mainly to keep track of all of the things broken by the PPC_FP merge. Phire and I both thought this would be a good idea to have for when it (inevitably) gets fixed.
Updated by MayImilae about 11 years ago
- Status changed from New to Accepted
- Milestone set to Current
- Priority set to High
- Category set to ppc
Updated by a41pizza about 11 years ago
Sonic Colors has these white graphics defects that show up on D3D and OpenGL (can't see any menus on OpenGL because it is broken with this: https://code.google.com/p/dolphin-emu/issues/detail?id=6914)
https://code.google.com/p/dolphin-emu/issues/detail?id=6837
Updated by JMC4789 about 11 years ago
Err, that's a separate issue. Whoops. Lrn2read, I'm up too late, blah blah, excuses
Updated by AlbertWesker1988 about 11 years ago
WWE 13 collision Bug issue
http://code.google.com/p/dolphin-emu/issues/detail?id=6950
Updated by a41pizza about 11 years ago
The no-menu bug for Sonic Colors (and the other Sonic games) onOpenGL is a separate issue, yes, but I was just giving a heads up so that you don't think it's a part of the ppc_merge break
Updated by JMC4789 about 11 years ago
Yeah, the lrn2read was for me, you obviously didn't say it was, I was just blindly reading issue numbers.
Updated by phire about 11 years ago
Ok the problem is the lfs (load float single) and stfs (store float single) instructions.
The PowerPC(gecko) only has double precision registers, so it converts singles to doubles when they are loaded into FP register file. But singles don't gain any extra precision, the single point operations still only use around 32 bits of the register (This is not entirely true, there is some special handling for denormal singles)
Dolphin emulates this behaviour by converting all singles to doubles using the CVTSS2SD instruction, then after each single precision op (actually implemented as a double) it executes a pair of CVTSD2SS/CVTSS2SD instructions to keep the precision of float in the double register at only single point.
This gives dolphin and gecko mostly the same behaviour.
The problem arises when the "denormal as zero" flag is enabled.
Because single <--> double conversions happen automatically on the gecko when singles are loaded in and out of memory, there is no need for explicit conversion instructions, while dolphin on the x86 has extra explicit single <--> double conversion instructions all over the place, including the lfs and stsf instructions.
Because the x86's conversions are explicit, they count as operations and denormals are changed to zero, while gecko's conversions are implicit and don't trigger a denormal as zero conversion.
This means on the gecko you can safely copy a 32 bit value from one place in memory to another with just a lfs/stfs pair while dolphin with it's new daz code will mutilate these values.
Updated by phire about 11 years ago
But why does JitIL still work?
JitIL has an optimisation which folds duplicate IL opcodes into each other and this optimisation (correctly?) assumes that a SingleToDouble followed by a DoubleToSingle is a nop and optimises it out.
So JitIL does the correct thing when a lfs/stfs pair are inside the same jit block however the mutilated SingleToDouble value is still calculated and written to the register file.
If a game does this trick over multiple jit blocks, JitIL will still generate an incorrect result.
Updated by JMC4789 about 11 years ago
Well, that explains that? I think? Either way, good job on tracking it down. Will the PPC_FP merge work if this is fixed?
Updated by degasus about 11 years ago
phire: imo we have to check what happens on real hw when we load a denormalized single and use it as double. Does it normalize them on the convertion, also in non-ieee mode?
Maybe there is a fast way to correctly load a single without ctz (but a branch if ieee-mode is enabled)
Updated by phire about 11 years ago
- Status changed from Accepted to Work started
degasus: The docs say that singles are normalized into the double registers and denormalized on the way back out. The docs (PowerPCProgEnv.pdf) also list the exact algorithm used.
flacs was suggesting that we use the x87 hardware to do float to double conversions, as it is just sitting around doing nothing.
There is also the possibility that x86 processors will correctly pipeline changes to the mxcsr register. I've seen no evidence to the contrary, but I'd want to do a test program/microbenchmark before going down that path.
Updated by JMC4789 almost 11 years ago
This looks like more fallout of the PPC_FP merge, this time in the revert.
Updated by flacs almost 11 years ago
Here is a minimal failing test case that passes on real hardware: https://github.com/degasus/gekkotest
Updated by degasus almost 11 years ago
explicit https://github.com/degasus/gekkotest/commit/239d34744baed2292e5f1b2a0ab5848a63185d0d
the next commit are likely not "minimal" ;)
Updated by JMC4789 almost 11 years ago
This should all be fixed by 4311caef094c434cea6770f8230a7c39318a64def. I can't confirm all the issues unfortunately.
Updated by flacs almost 11 years ago
- Status changed from Work started to Fixed
The branch with the fixes was merged in 311caef094c434cea6770f8230a7c39318a64def. I tested most of the issues and they're gone now. Let's close this for now.