Emulator Issues #3829
closedSpeed improvement for OpenGL Plugin
0%
Description
Hello Devs!
As I like your project I wanted to do my 2cents for you :)
I was always annoyed by the fat slowdown on my machine when watching the Wind Waker intro. At the moment the island pops into the viewport the framerate dropped to ~14fps :(
Having a look at the glcalls with gdebugger the reason was quite obvious... there are around 180000 OpenGL commands executed per frame :( Most of them vertex shader programs.
Well, attached you will find a patch with tweaked SetMultiVSConstant functions. The patch is backward compatible if the GL_EXT_gpu_program_parameters extension isn't supported. In that case the old, slow version is used :(
If it is available, the plugin is quite as fast as the dx version now :)
Now its also more fun to watch the MP1 intro :)
Updated by james.jdunne about 14 years ago
- Status changed from New to Accepted
I will apply your patch locally and test. If it does what you say it does, I just might commit it :)
Updated by james.jdunne about 14 years ago
- Status changed from Accepted to Work started
Sorry, I don't have Wind Waker. I tried your patch and didn't notice any difference while playing NSMBW. I tested against the World 8 map and consistently get ~40FPS with and without your patch.
Perhaps my hardware is unaffected by your changes? nVidia GeForce 9800 GTX+
Since it's such a small patch and doesn't seem (to me) to cause any harm, I'll commit it.
Updated by james.jdunne about 14 years ago
- Status changed from Work started to Fixed
Patch applied in r6713
Updated by Metzelmaennchen about 14 years ago
9800 GTX+ is a heavy shader cruncher... go back and try a hd4870 ;)
But you should see an effect in an opengl debugger (using gdebugger which is for free). To speak in numbers.
Before there were a max of 190k OGL calls for the Wind Waker intro (isn't there a demo flying around?) and now the max is around 35k :)
Having a look at NSMBW (into) there were 24k OGL calls which now drops to 8.5k :)
Updated by james.jdunne about 14 years ago
Well your patch is in so I'll just go ahead and trust you :)
Updated by Metzelmaennchen about 14 years ago
That's nice that you'll trust me ;)
Now I go for the redundant Get/Set changes... my profiler reports around 95% unnecessary changes. As they are not cheap, I think there is something more to gain with a bitmask tracking the states.
Updated by NeoBrainX about 14 years ago
fwiw, don't any decent gfx drivers optimize out obsolete (i.e. duplicate) state changes anyway?
At least that's what I heard for D3D, not sure if they do that for OGL, too.
Updated by Metzelmaennchen about 14 years ago
Hello NeoBrain!
Very interesting stuff you are pointing here and you are totally right (for dx this only works in pure mode if I remember correctly). Hopefully every driver writer implements such a thing :)
But there is still the cost of sending calls to the driver. On a first shot there are three functions which use 26% of all calls... reported to be 95% useless.
Breaking down these calls is visible in a lower usage of the corresponding driver, well at the moment there is a benefit of 1,6% for the fglrx_dri module :)
Updated by sl1nk3.s about 14 years ago
I didn't see much of a difference here using a 4870 on 7x64, but that's a good patch tho.