Eevee: Shader recompilation issue #51467

New Issue

Dalai Felinto · 2017-05-11T12:23:02+02:00

Dalai Felinto commented

2017-05-11 12:23:02 +02:00

The scene can be as simple as the initial scene with the initial cube. Set "use nodes" for the cube material and drag any of the material output node parameters. The slider dragging is not a smooth experience at all.

Dalai Felinto commented

2017-05-11 12:23:02 +02:00

Changed status to: 'Open'

Dalai Felinto commented

2017-05-11 12:23:02 +02:00

Added subscribers: @dfelinto, @fclem

Clément Foucault was assigned by Dalai Felinto

2017-05-18 12:24:09 +02:00

Dalai Felinto commented

2017-06-25 20:02:53 +02:00

I did some investigation, and I get the following results comparing Eevee and master. Note that the basic Eevee shader has ~ 5,000 lines of code, and master has ~ 3,500. Also, the tests below are focusing only on the shader. In Blender itself we also spend a considerable time in the GWN_shaderinterface_create() function.

Results

In Linux with an AMD Radeon RX480 running Mesa (Gallium 0.4 - 4.10.0-24-generic) we get:

Version: 4.5
Core profile: 1

time start (control):  hello.cpp:115
time end   (control): 0.003271  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.088870  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.189856  hello.cpp:110

That means Eevee shader compilation is taken 2x as much as master shaders. And those ~200ms means there is a big lag every time the shader is recompiled (e.g., when the user drags a slider in a node).
Running from within Blender I get a similar result, so this sandbox seems well representative of the real production environment.

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

Version: 3.3
Core profile: 1
time start (control):  hello.cpp:115
time end   (control): 0.032880  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.063129  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.320869  hello.cpp:110

Now that's more interesting. While the master shader compiles faster, the eevee one compiles considerably slower.

Note: All tests were ran with __GL_SHADER_DISK_CACHE=0, so to prevent cached shaders from being used.

To run it for yourself, check the code on: https://github.com/dfelinto/opengl-sandbox

I did some investigation, and I get the following results comparing Eevee and master. Note that the basic Eevee shader has ~ 5,000 lines of code, and master has ~ 3,500. Also, the tests below are focusing only on the shader. In Blender itself we also spend a considerable time in the GWN_shaderinterface_create() function. Results **** In Linux with an AMD Radeon RX480 running Mesa (Gallium 0.4 - 4.10.0-24-generic) we get: Version: 4.5 Core profile: 1 ``` time start (control): hello.cpp:115 time end (control): 0.003271 hello.cpp:117 time start (master): hello.cpp:101 time end (master): 0.088870 hello.cpp:103 time start (eevee): hello.cpp:108 time end (eevee): 0.189856 hello.cpp:110 ``` That means Eevee shader compilation is taken 2x as much as master shaders. And those ~200ms means there is a big lag every time the shader is recompiled (e.g., when the user drags a slider in a node). Running from within Blender I get a similar result, so this sandbox seems well representative of the real production environment. In a NVIDIA Quadro K6000, proprietary driver 375.39 I get: ``` Version: 3.3 Core profile: 1 time start (control): hello.cpp:115 time end (control): 0.032880 hello.cpp:117 time start (master): hello.cpp:101 time end (master): 0.063129 hello.cpp:103 time start (eevee): hello.cpp:108 time end (eevee): 0.320869 hello.cpp:110 ``` Now that's more interesting. While the master shader compiles faster, the eevee one compiles considerably slower. *Note: All tests were ran with __GL_SHADER_DISK_CACHE=0, so to prevent cached shaders from being used.* To run it for yourself, check the code on: https://github.com/dfelinto/opengl-sandbox

Brecht Van Lommel commented

2017-06-26 01:13:40 +02:00

Added subscriber: @brecht

Brecht Van Lommel commented

2017-06-26 01:13:40 +02:00

You could try removing unused functions from that eevee.fp and see if it's still slow. But since linking is not affected by unused functions, it's not going to help reduce that. Maybe some particularly problematic code can be identified by elimination.

Looking at this [Unreal tutorial ]], it takes 3s (!) to update when they edit one value in the shader graph. They made shader compilation non-blocking so it's not as bad, and ideally we should do the same in Blender. Maybe it's slow because of the little shader preview renders or something else, but still makes you wonder if it can actually be made as fast as we would like. If you want instant feedback in Unreal, it seems you need to create [ https:*youtu.be/sx0BlzGkahw?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=642 | material instances where some parameters become uniforms.

So perhaps your idea of compiling all node parameters as uniforms is really the main way to get better performance while creating the shader graph. A second more optimized shader with constants could be compiled in the background if needed. Ideally we could avoid users having to think about material instances to get faster feedback when tweaking shaders, but if we do need them then they could be node groups with sockets that you can't link to.

You could try removing unused functions from that `eevee.fp` and see if it's still slow. But since linking is not affected by unused functions, it's not going to help reduce that. Maybe some particularly problematic code can be identified by elimination. Looking at this [Unreal tutorial ]], it takes 3s (!) to update when they edit one value in the shader graph. They made shader compilation non-blocking so it's not as bad, and ideally we should do the same in Blender. Maybe it's slow because of the little shader preview renders or something else, but still makes you wonder if it can actually be made as fast as we would like. If you want instant feedback in Unreal, it seems you need to create [[ https:*youtu.be/sx0BlzGkahw?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=642 | material instances ](https:*youtu.be/6D1pYhxrZ7o?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=13) where some parameters become uniforms. So perhaps your idea of compiling all node parameters as uniforms is really the main way to get better performance while creating the shader graph. A second more optimized shader with constants could be compiled in the background if needed. Ideally we could avoid users having to think about material instances to get faster feedback when tweaking shaders, but if we do need them then they could be node groups with sockets that you can't link to.

Mike Erwin commented

2017-06-26 02:38:32 +02:00

Added subscriber: @MikeErwin

Mike Erwin commented

2017-06-26 02:38:32 +02:00

I too think making uniforms for all parameters is the way to go, when editing or animating a material. Constant value nodes can be compiled for faster preview/playback when a material is not being edited. This is one area we can do better than the Unreal editor.

I too think making uniforms for all parameters is the way to go, when editing or animating a material. Constant value nodes can be compiled for faster preview/playback when a material is *not* being edited. This is one area we can do *better* than the Unreal editor.

Brecht Van Lommel commented

2017-06-26 03:39:36 +02:00

I think the performance issue in GWN_shaderinterface_create() should be fixed though, it's taking up 30% of compilation time here. It's not clear to me why Gawain caches all this data about uniforms and attributes, and the lookup by name has O(n²) behavior. If it's done for performance reasons then I don't think Gawain can do name -> location mappings much faster than glGetUniformLocation(). Caching of those locations needs to happen at a higher level to avoid that mapping entirely.

I think the performance issue in `GWN_shaderinterface_create()` should be fixed though, it's taking up 30% of compilation time here. It's not clear to me why Gawain caches all this data about uniforms and attributes, and the lookup by name has O(n²) behavior. If it's done for performance reasons then I don't think Gawain can do name -> location mappings much faster than `glGetUniformLocation()`. Caching of those locations needs to happen at a higher level to avoid that mapping entirely.

Dalai Felinto commented

2017-06-27 19:00:11 +02:00

But since linking is not affected by unused functions, it's not going to help reduce that.

As it turned out, we do can get some benefit from doing it. I updated the github program with a "lean" version of the Eevee shader, with only the needed functions.

Of course there is overhead on including only the required functions. And those numbers are for the simplest shader. We may not get so many things to trim down in a more production-ready material.

In Linux with an AMD Radeon RX480 running Mesa I get:

	Eevee	Eevee Lean	Master	Control
glCompileShader	65ms	20 ms	49ms	3ms
glLinkProgram	125ms	83 ms	43ms	0ms
Total	190 ms	103 ms	125ms	4ms

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

	Eevee	Eevee Lean	Master	Control
glCompileShader	36ms	10 ms	30ms	17ms
glLinkProgram	287ms	257 ms	38ms	18ms
Total	323 ms	267 ms	67ms	36ms

> But since linking is not affected by unused functions, it's not going to help reduce that. As it turned out, we do can get some benefit from doing it. I updated the [github program ](https://github.com/dfelinto/opengl-sandbox) with a "lean" version of the Eevee shader, with only the needed functions. Of course there is overhead on including only the required functions. And those numbers are for the simplest shader. We may not get so many things to trim down in a more production-ready material. In Linux with an AMD Radeon RX480 running Mesa I get: | | Eevee | Eevee Lean | Master | Control | | -- | -- | -- | -- | -- | | glCompileShader | 65ms | 20 ms | 49ms | 3ms | | glLinkProgram | 125ms | 83 ms | 43ms | 0ms | | Total | 190 ms | 103 ms | 125ms | 4ms | In a NVIDIA Quadro K6000, proprietary driver 375.39 I get: | | Eevee | Eevee Lean | Master | Control | | -- | -- | -- | -- | -- | | glCompileShader | 36ms | 10 ms | 30ms | 17ms | | glLinkProgram | 287ms | 257 ms | 38ms | 18ms | | Total | 323 ms | 267 ms | 67ms | 36ms |

Dalai Felinto commented

2017-07-17 11:33:46 +02:00

Changed status from 'Open' to: 'Resolved'

Dalai Felinto closed this issue

2017-07-17 11:33:46 +02:00

Dalai Felinto commented

2017-07-17 11:33:46 +02:00

"Fixed" on 2a489273d7 by making uniforms out of all the nodetree inputs. I will close it for now, but I would still like to see the performance on GWN_shaderinterface_create addressed.

"Fixed" on 2a489273d7e2 by making uniforms out of all the nodetree inputs. I will close it for now, but I would still like to see the performance on GWN_shaderinterface_create addressed.

Clément Foucault commented

2017-09-27 01:07:04 +02:00

Ok so I tried to see what causes the huge linking time and the answer is really not pleasing.

This is caused by the number of branching in the shader. So things like 518e768579 which basically double every eevee's branches roughly double the linking time

Using the test program provided by @dfelinto I got theses results :
Current principled shader : 3011ms
Previous principled shader (without the branch) : 1781ms

Most of the shader branching complexity is inside the lamps evaluation. And the addition of the Cascaded shadow map lately increased this complexity.
To reduce the branching I can order lamps by type/shadow and iterate on each type one after the other like I do for the probes.

Reducing branching in lamps code could "hypotheticaly" get us to 1667ms. (I did a really short test bypassing all the "if"s and using the worst case).
But of course that will never be the case and I sense the end result will be more like 2000ms.
And removing the branching from the principle shader gets us to 1020ms. But this would remove the purpose of it's runtime optimisation.

Anyway this would still be higher than 1sec for each Material shader. So we REALLY need lazy shader compilation.

Here are the shader files if someone care to take a shot : eevee-huge.fp eevee-huge.vp eevee-huge-fixed.fp

Ok so I tried to see what causes the huge linking time and the answer is really not pleasing. This is caused by the number of branching in the shader. So things like 518e768579 which basically double every eevee's branches roughly double the linking time Using the test program provided by @dfelinto I got theses results : Current principled shader : 3011ms Previous principled shader (without the branch) : 1781ms Most of the shader branching complexity is inside the lamps evaluation. And the addition of the Cascaded shadow map lately increased this complexity. To reduce the branching I can order lamps by type/shadow and iterate on each type one after the other like I do for the probes. Reducing branching in lamps code could "hypotheticaly" get us to 1667ms. (I did a really short test bypassing all the "if"s and using the worst case). But of course that will never be the case and I sense the end result will be more like 2000ms. And removing the branching from the principle shader gets us to 1020ms. But this would remove the purpose of it's runtime optimisation. Anyway this would still be higher than 1sec for each Material shader. So we **REALLY** need lazy shader compilation. Here are the shader files if someone care to take a shot : [eevee-huge.fp](https://archive.blender.org/developer/F884339/eevee-huge.fp) [eevee-huge.vp](https://archive.blender.org/developer/F884341/eevee-huge.vp) [eevee-huge-fixed.fp](https://archive.blender.org/developer/F884347/eevee-huge-fixed.fp)

Clément Foucault commented

2017-09-27 01:07:36 +02:00

Changed status from 'Resolved' to: 'Open'

Clément Foucault reopened this issue

2017-09-27 01:07:36 +02:00

Dalai Felinto commented

2017-09-27 18:26:45 +02:00

Confirmed the really low times linking times due to branching. I even added a new --debug-gpu-shaders that for now dumps the compiled shaders in the Blender temp session folder.

Confirmed the really low times linking times due to branching. I even added a new `--debug-gpu-shaders` that for now dumps the compiled shaders in the Blender temp session folder.

Clément Foucault commented

2018-07-17 12:10:24 +02:00

Changed status from 'Open' to: 'Resolved'

Clément Foucault closed this issue

2018-07-17 12:10:24 +02:00

Clément Foucault commented

2018-07-17 12:10:24 +02:00

Lazy (Deferred) Shader compilation is in 2.8 for quite some time now. Closing...

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Eevee: Shader recompilation issue #51467