Project

General

Profile

Actions

Emulator Issues #6167

closed

Direct3D 9 TODO list

Added by rodolfoosvaldobogado about 11 years ago.

Status:
Won't fix
Priority:
Normal
Category:
GFX
% Done:

0%

Operating system:
Windows
Issue type:
Task
Milestone:
Regression:
No
Relates to usability:
No
Relates to performance:
No
Easy:
No
Relates to maintainability:
Yes
Regression start:
Fixed in:

Description

  • Improve Primitive management to avoid unneeded work
  • improve vertex loader
  • improve texture decoding
  • dual source alpha
  • improve videocommon to minimize dx9 dependency
Actions #1

Updated by NeoBrainX about 11 years ago

  • Status changed from New to Accepted
  • Category changed from gfx to dx9
  • Relates to maintainability set to Yes
  • Operating system Windows added

"Improve" is quite a general term, what about outlining what specifically should be done (and how)?

Actions #2

Updated by delroth about 11 years ago

  • Find a way to support all future features that are compatible with DX11 and GL but not DX9 (for example, batching rendering calls with same shader and different uniforms by using indexed access to the uniform values).
Actions #3

Updated by degasus about 11 years ago

ok, so let's start with the first point: Improve Primitive management
I want to join our primitives in a single triangle_strip instead of triangles to avoid index size. The best way to split two strips is primitve restart. If it isn't available, I think generating degenerated triangles would be the best (just add the first and the last vertex twice). Do you know any better way to do this?

I don't want you to implement workarounds, but as I don't know much about DX9, I need help for designing. I'll be glad to see you eg on irc - smaller response delay, so better for discussions :-)

Actions #4

Updated by rodolfoosvaldobogado about 11 years ago

no problem the workaroun for dx9 is really easy but i'm worried about this:
http://hacksoflife.blogspot.com.ar/2010/01/to-strip-or-not-to-strip.html
the analisys there is valid and what worries me is that we are in the scenario where we have a lot of small strips. There the goods of using strips is lost and i think that in our case, this will cause performance drops.

Actions #5

Updated by degasus about 11 years ago

The analisys isn't that valid:
"What about primitive restart? Well, it's nvidia only"
primitive restart is in ogl3.1 core, so all actual gpus can do it

"so you get no benefit"
Here they are right, we wouldn't get any gpu improvments - but this wasn't my goal. I want to get cpu improvments as merging triangle_strips is faster than converting them to triangle_list

btw: I've done the first step for ogl in branch primitive_restart. I haven't start implementing my suggested workaround. Which workaround would you prefer?

Actions #6

Updated by rodolfoosvaldobogado about 11 years ago

the analisys is old but valid in the sence that from the gpu side you lost the advetage, but from the cpu side you are rigth.
i think that the way to go is for a backend that does not support primitice restart just including a degenerated triangle to conect the primitives is valid. I think that a good idea could be conserve both paths, because in games that use a lot of plain triangles, you will make a lot of degenerated triangles.
i'm going to start with the vertex traslator, wath do you think about including a bitfield in the backend configuration to decalre the suported formats. then when the traslator inits if the backend suport the format it fills the function array with a simple copy, if not it select ne next smalest suported type to determine the valid transformation function.

Actions #7

Updated by degasus about 11 years ago

so, my next prove of concept is ready: vertex_loader_improvements

I've started with the position attribute. Shifting is now done on gpu, which need an additional component, but now only upload integers. So we don't have to convert everything to floats.
For dx9, I added the template parameter for convertion. So we convert s8 into s16. Here it was easy, normals are much harder as we use normalization there ...

Actions #8

Updated by NeoBrainX about 11 years ago

Revision 6958822f1957 should probably be ported to the D3D9 backend.
We likely want to use IDirect3DDevice9::Reset for that, cf. http://msdn.microsoft.com/en-us/library/bb174425%28v=vs.85%29.aspx .

Actions #9

Updated by NeoBrainX about 11 years ago

  • perf queries aren't implemented in d3d9, yet. I think it's possible to implement them via occlusion queries though.

(changing the summary to sound less like "Rodolfo fix stuff or I'll kill D3D9" :p)

Actions #10

Updated by NeoBrainX about 11 years ago

Cf. revision 008fdc73106a142212fc5cd4481d13535103b954 about perf queries, btw.

Actions #12

Updated by degasus about 11 years ago

rodolfo: primitive_restart seems to work on OGL and DX11. Both fans and lists are converted to strips. Degenerated triangles workaround isn't implemented, but they are still converted to lists.
Is there an option to enable primitive restart on DX9?

Actions #13

Updated by rodolfoosvaldobogado about 11 years ago

There is no oficial support for primitive restars. Some vendors just
implement it by passing -1 as an index value. I thinl that the only way
will be inser degenerated triangles
El 08/04/2013 15:42, escribi�

Actions #14

Updated by Billiard26 about 11 years ago

  • Issue type set to Task
Actions #15

Updated by NeoBrainX about 11 years ago

  • EFB to RAM still has severe pixel->texel mapping issues in D3D9. For example, switching between EFB2RAM and EFB2Tex in the New Super Mario Bros Wii logo screen makes this fairly obvious.
Actions #16

Updated by NeoBrainX almost 11 years ago

  • line width. It's not possible to fix this properly because the exact rasterization rules can't be fullfilled in d3d9 (unless you're doing software transformation of the lines), but at least partial support is possible.
Actions #17

Updated by NeoBrainX over 10 years ago

  • also, we're are hitting some very hard problems in PixelShaderGen caused by us emulating TEV with floating points. 1) the whole U8 overflow code might cause massive performance issues 2) it has severe compatibility issues which make us consider moving away from float-points completely. Keeping D3D9 compatibility is going to be really hard there.
Actions #18

Updated by delroth over 10 years ago

Early Z (zcomploc) is now implemented properly in OGL and D3D11. This needs to be done in D3D9 too.

Actions #19

Updated by rodolfoosvaldobogado over 10 years ago

early z is now implemented in dx9-early-depth branch so testing is needed to allow merge to main branch

Actions #20

Updated by delroth over 10 years ago

Nope, not working. Baten Kaitos (GKBEAF):

D3D11: http://imgur.com/o5opYEB
D3D9: http://imgur.com/6fpQsfR

Actions #21

Updated by delroth over 10 years ago

FWIW, my extremely simple zcomploc test case (http://delroth.net/zcomploc.dol switch between z test before/after using WM "A" button) does not work with D3D9 either.

Actions #22

Updated by rodolfoosvaldobogado over 10 years ago

what is the correct output of the test?

Actions #23

Updated by delroth over 10 years ago

Just try it with D3D11 and OGL. They both emulate zcomploc properly.

Actions #24

Updated by NeoBrainX over 10 years ago

Alternatively, if you don't have a d3d11.0 capable GPU: The software renderer surely renders the test correctly.

Actions #25

Updated by rodolfoosvaldobogado over 10 years ago

After some research i came to the conclusion that zcomplock is impossible to implement in dx9, the problem is that early depth test is not possible is you use depth output (solved in last commit)or any conditional pixel function is used (clip/discard). As these function are used to emulate alpha test there is no possible way to implement the functionality without breaking other functionalities.

Actions #26

Updated by rodolfoosvaldobogado over 10 years ago

Just to leave a little log of one idea, and get some feedback from the other devs.
One of the remaining issues that is not properly emulated on the accelerated backbends is the point and line rendering, the exact behavior of the wii/gc rasterizes can’t be emulated exactly with none of the implementation, except for the software plugin. I think the best approach to achieve an accurate emulation would be:
For each point generate 3 additional vertices, and for each vertex in lines we generate 2, that are copies of the original (as dx11 is doing in geometry shaders, in dx9 it could be done in the cpu). As the offset to the position in the real hard is done in screen space, one way to emulate this is to set a flag on each vertex to point the corresponding offset, and apply it after transformation. In the current implementation this could be achieved using and extra blend index and adding new registers to store the screen space and the texture offset, and apply them after the vertex transformation. This way the issues with point and line sizing could be fixed with the same implementation in all the backbends.

Actions #27

Updated by NeoBrainX over 10 years ago

Actually, point + line rendering is done accurately in D3D11, as far as I know. It even respects the correct rasterization rules (i.e. rendering of the line depending on its "steepness" in screen space).

Generating additional vertices on the other hand isn't going to respect the rasterization rules unless you use software vertex transforming or something.

Generally, I don't think it's worth bothering since it will add a considerable amount of code to VideoCommon shortly before the d3d9 removal (which is planned to happen shortly after the 4.0 release, i.e. in a month or two).

Actions #28

Updated by rodolfoosvaldobogado over 10 years ago

sorry to disagry neo. but point and line rendering is far from accurate. the offsets are added in the geometry shader that causes that the final vertex position is deformed by the modelview/proyection transformation. the generation of additional vertices will be done only on dx9 plugin so no modifications to videocommon there. and the changes in videocommon are:
1- add 6 new float4 constants to the vertex shader constant buffer.
2- modify the vertex shader generator adding the offsets indexing by a second blend index.
3- modify the vertex writer to always write blend indices.
In each backend change the implementation of setlinewith to update this contants.
and generate the aditional vertex(geometry shaders in dx11/ogl, software in dx9 plugin)

Actions #29

Updated by delroth over 10 years ago

Can you provide a testcase (DOL or DFF, both are fine) that shows point and line rendering being inaccurate?

Actions #30

Updated by rodolfoosvaldobogado over 10 years ago

the lines in zelda tp map just compare them using software plugin and dx11

Actions #31

Updated by delroth over 10 years ago

Please provide a DFF.

Actions #32

Updated by parlane about 10 years ago

  • Status changed from Accepted to Won't fix
Actions

Also available in: Atom PDF