Eevee: Shader recompilation issue
Closed, ResolvedPublic

Description

The scene can be as simple as the initial scene with the initial cube. Set "use nodes" for the cube material and drag any of the material output node parameters. The slider dragging is not a smooth experience at all.

Details

Type
Bug

I did some investigation, and I get the following results comparing Eevee and master. Note that the basic Eevee shader has ~ 5,000 lines of code, and master has ~ 3,500. Also, the tests below are focusing only on the shader. In Blender itself we also spend a considerable time in the GWN_shaderinterface_create() function.

Results

In Linux with an AMD Radeon RX480 running Mesa (Gallium 0.4 - 4.10.0-24-generic) we get:

Version: 4.5
Core profile: 1

time start (control):  hello.cpp:115
time end   (control): 0.003271  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.088870  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.189856  hello.cpp:110

That means Eevee shader compilation is taken 2x as much as master shaders. And those ~200ms means there is a big lag every time the shader is recompiled (e.g., when the user drags a slider in a node).
Running from within Blender I get a similar result, so this sandbox seems well representative of the real production environment.

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

Version: 3.3
Core profile: 1
time start (control):  hello.cpp:115
time end   (control): 0.032880  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.063129  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.320869  hello.cpp:110

Now that's more interesting. While the master shader compiles faster, the eevee one compiles considerably slower.

Note: All tests were ran with __GL_SHADER_DISK_CACHE=0, so to prevent cached shaders from being used.

To run it for yourself, check the code on: https://github.com/dfelinto/opengl-sandbox

You could try removing unused functions from that eevee.fp and see if it's still slow. But since linking is not affected by unused functions, it's not going to help reduce that. Maybe some particularly problematic code can be identified by elimination.

Looking at this Unreal tutorial, it takes 3s (!) to update when they edit one value in the shader graph. They made shader compilation non-blocking so it's not as bad, and ideally we should do the same in Blender. Maybe it's slow because of the little shader preview renders or something else, but still makes you wonder if it can actually be made as fast as we would like. If you want instant feedback in Unreal, it seems you need to create material instances where some parameters become uniforms.

So perhaps your idea of compiling all node parameters as uniforms is really the main way to get better performance while creating the shader graph. A second more optimized shader with constants could be compiled in the background if needed. Ideally we could avoid users having to think about material instances to get faster feedback when tweaking shaders, but if we do need them then they could be node groups with sockets that you can't link to.

I too think making uniforms for all parameters is the way to go, when editing or animating a material. Constant value nodes can be compiled for faster preview/playback when a material is not being edited. This is one area we can do better than the Unreal editor.

I think the performance issue in GWN_shaderinterface_create() should be fixed though, it's taking up 30% of compilation time here. It's not clear to me why Gawain caches all this data about uniforms and attributes, and the lookup by name has O(n²) behavior. If it's done for performance reasons then I don't think Gawain can do name -> location mappings much faster than glGetUniformLocation(). Caching of those locations needs to happen at a higher level to avoid that mapping entirely.

But since linking is not affected by unused functions, it's not going to help reduce that.

As it turned out, we do can get some benefit from doing it. I updated the github program with a "lean" version of the Eevee shader, with only the needed functions.

Of course there is overhead on including only the required functions. And those numbers are for the simplest shader. We may not get so many things to trim down in a more production-ready material.

In Linux with an AMD Radeon RX480 running Mesa I get:

EeveeEevee LeanMasterControl
glCompileShader65ms20 ms49ms3ms
glLinkProgram125ms83 ms43ms0ms
Total190 ms103 ms125ms4ms

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

EeveeEevee LeanMasterControl
glCompileShader36ms10 ms30ms17ms
glLinkProgram287ms257 ms38ms18ms
Total323 ms267 ms67ms36ms
Dalai Felinto (dfelinto) closed this task as "Resolved".Mon, Jul 17, 11:33 AM

"Fixed" on 2a489273d7e2 by making uniforms out of all the nodetree inputs. I will close it for now, but I would still like to see the performance on GWN_shaderinterface_create addressed.