Eevee: Shader recompilation issue #51467

Closed
opened 2017-05-11 12:23:02 +02:00 by Dalai Felinto · 16 comments

The scene can be as simple as the initial scene with the initial cube. Set "use nodes" for the cube material and drag any of the material output node parameters. The slider dragging is not a smooth experience at all.

The scene can be as simple as the initial scene with the initial cube. Set "use nodes" for the cube material and drag any of the material output node parameters. The slider dragging is not a smooth experience at all.
Author
Owner

Changed status to: 'Open'

Changed status to: 'Open'
Author
Owner

Added subscribers: @dfelinto, @fclem

Added subscribers: @dfelinto, @fclem
Clément Foucault was assigned by Dalai Felinto 2017-05-18 12:24:09 +02:00
Author
Owner

I did some investigation, and I get the following results comparing Eevee and master. Note that the basic Eevee shader has ~ 5,000 lines of code, and master has ~ 3,500. Also, the tests below are focusing only on the shader. In Blender itself we also spend a considerable time in the GWN_shaderinterface_create() function.

Results


In Linux with an AMD Radeon RX480 running Mesa (Gallium 0.4 - 4.10.0-24-generic) we get:

Version: 4.5
Core profile: 1

time start (control):  hello.cpp:115
time end   (control): 0.003271  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.088870  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.189856  hello.cpp:110

That means Eevee shader compilation is taken 2x as much as master shaders. And those ~200ms means there is a big lag every time the shader is recompiled (e.g., when the user drags a slider in a node).
Running from within Blender I get a similar result, so this sandbox seems well representative of the real production environment.

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

Version: 3.3
Core profile: 1
time start (control):  hello.cpp:115
time end   (control): 0.032880  hello.cpp:117
time start (master):  hello.cpp:101
time end   (master): 0.063129  hello.cpp:103
time start (eevee):  hello.cpp:108
time end   (eevee): 0.320869  hello.cpp:110

Now that's more interesting. While the master shader compiles faster, the eevee one compiles considerably slower.

Note: All tests were ran with __GL_SHADER_DISK_CACHE=0, so to prevent cached shaders from being used.

To run it for yourself, check the code on: https://github.com/dfelinto/opengl-sandbox

I did some investigation, and I get the following results comparing Eevee and master. Note that the basic Eevee shader has ~ 5,000 lines of code, and master has ~ 3,500. Also, the tests below are focusing only on the shader. In Blender itself we also spend a considerable time in the GWN_shaderinterface_create() function. Results **** In Linux with an AMD Radeon RX480 running Mesa (Gallium 0.4 - 4.10.0-24-generic) we get: Version: 4.5 Core profile: 1 ``` time start (control): hello.cpp:115 time end (control): 0.003271 hello.cpp:117 time start (master): hello.cpp:101 time end (master): 0.088870 hello.cpp:103 time start (eevee): hello.cpp:108 time end (eevee): 0.189856 hello.cpp:110 ``` That means Eevee shader compilation is taken 2x as much as master shaders. And those ~200ms means there is a big lag every time the shader is recompiled (e.g., when the user drags a slider in a node). Running from within Blender I get a similar result, so this sandbox seems well representative of the real production environment. In a NVIDIA Quadro K6000, proprietary driver 375.39 I get: ``` Version: 3.3 Core profile: 1 time start (control): hello.cpp:115 time end (control): 0.032880 hello.cpp:117 time start (master): hello.cpp:101 time end (master): 0.063129 hello.cpp:103 time start (eevee): hello.cpp:108 time end (eevee): 0.320869 hello.cpp:110 ``` Now that's more interesting. While the master shader compiles faster, the eevee one compiles considerably slower. *Note: All tests were ran with __GL_SHADER_DISK_CACHE=0, so to prevent cached shaders from being used.* To run it for yourself, check the code on: https://github.com/dfelinto/opengl-sandbox

Added subscriber: @brecht

Added subscriber: @brecht

You could try removing unused functions from that eevee.fp and see if it's still slow. But since linking is not affected by unused functions, it's not going to help reduce that. Maybe some particularly problematic code can be identified by elimination.

Looking at this [Unreal tutorial ]], it takes 3s (!) to update when they edit one value in the shader graph. They made shader compilation non-blocking so it's not as bad, and ideally we should do the same in Blender. Maybe it's slow because of the little shader preview renders or something else, but still makes you wonder if it can actually be made as fast as we would like. If you want instant feedback in Unreal, it seems you need to create [ https:*youtu.be/sx0BlzGkahw?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=642 | material instances where some parameters become uniforms.

So perhaps your idea of compiling all node parameters as uniforms is really the main way to get better performance while creating the shader graph. A second more optimized shader with constants could be compiled in the background if needed. Ideally we could avoid users having to think about material instances to get faster feedback when tweaking shaders, but if we do need them then they could be node groups with sockets that you can't link to.

You could try removing unused functions from that `eevee.fp` and see if it's still slow. But since linking is not affected by unused functions, it's not going to help reduce that. Maybe some particularly problematic code can be identified by elimination. Looking at this [Unreal tutorial ]], it takes 3s (!) to update when they edit one value in the shader graph. They made shader compilation non-blocking so it's not as bad, and ideally we should do the same in Blender. Maybe it's slow because of the little shader preview renders or something else, but still makes you wonder if it can actually be made as fast as we would like. If you want instant feedback in Unreal, it seems you need to create [[ https:*youtu.be/sx0BlzGkahw?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=642 | material instances ](https:*youtu.be/6D1pYhxrZ7o?list=PLZlv_N0_O1gbQjgY0nDwZNYe_N8IcYWS-&t=13) where some parameters become uniforms. So perhaps your idea of compiling all node parameters as uniforms is really the main way to get better performance while creating the shader graph. A second more optimized shader with constants could be compiled in the background if needed. Ideally we could avoid users having to think about material instances to get faster feedback when tweaking shaders, but if we do need them then they could be node groups with sockets that you can't link to.
Member

Added subscriber: @MikeErwin

Added subscriber: @MikeErwin
Member

I too think making uniforms for all parameters is the way to go, when editing or animating a material. Constant value nodes can be compiled for faster preview/playback when a material is not being edited. This is one area we can do better than the Unreal editor.

I too think making uniforms for all parameters is the way to go, when editing or animating a material. Constant value nodes can be compiled for faster preview/playback when a material is *not* being edited. This is one area we can do *better* than the Unreal editor.

I think the performance issue in GWN_shaderinterface_create() should be fixed though, it's taking up 30% of compilation time here. It's not clear to me why Gawain caches all this data about uniforms and attributes, and the lookup by name has O(n²) behavior. If it's done for performance reasons then I don't think Gawain can do name -> location mappings much faster than glGetUniformLocation(). Caching of those locations needs to happen at a higher level to avoid that mapping entirely.

I think the performance issue in `GWN_shaderinterface_create()` should be fixed though, it's taking up 30% of compilation time here. It's not clear to me why Gawain caches all this data about uniforms and attributes, and the lookup by name has O(n²) behavior. If it's done for performance reasons then I don't think Gawain can do name -> location mappings much faster than `glGetUniformLocation()`. Caching of those locations needs to happen at a higher level to avoid that mapping entirely.
Author
Owner

But since linking is not affected by unused functions, it's not going to help reduce that.

As it turned out, we do can get some benefit from doing it. I updated the github program with a "lean" version of the Eevee shader, with only the needed functions.

Of course there is overhead on including only the required functions. And those numbers are for the simplest shader. We may not get so many things to trim down in a more production-ready material.

In Linux with an AMD Radeon RX480 running Mesa I get:

Eevee Eevee Lean Master Control
glCompileShader 65ms 20 ms 49ms 3ms
glLinkProgram 125ms 83 ms 43ms 0ms
Total 190 ms 103 ms 125ms 4ms

In a NVIDIA Quadro K6000, proprietary driver 375.39 I get:

Eevee Eevee Lean Master Control
glCompileShader 36ms 10 ms 30ms 17ms
glLinkProgram 287ms 257 ms 38ms 18ms
Total 323 ms 267 ms 67ms 36ms
> But since linking is not affected by unused functions, it's not going to help reduce that. As it turned out, we do can get some benefit from doing it. I updated the [github program ](https://github.com/dfelinto/opengl-sandbox) with a "lean" version of the Eevee shader, with only the needed functions. Of course there is overhead on including only the required functions. And those numbers are for the simplest shader. We may not get so many things to trim down in a more production-ready material. In Linux with an AMD Radeon RX480 running Mesa I get: | | Eevee | Eevee Lean | Master | Control | | -- | -- | -- | -- | -- | | glCompileShader | 65ms | 20 ms | 49ms | 3ms | | glLinkProgram | 125ms | 83 ms | 43ms | 0ms | | Total | 190 ms | 103 ms | 125ms | 4ms | In a NVIDIA Quadro K6000, proprietary driver 375.39 I get: | | Eevee | Eevee Lean | Master | Control | | -- | -- | -- | -- | -- | | glCompileShader | 36ms | 10 ms | 30ms | 17ms | | glLinkProgram | 287ms | 257 ms | 38ms | 18ms | | Total | 323 ms | 267 ms | 67ms | 36ms |
Author
Owner

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'
Author
Owner

"Fixed" on 2a489273d7 by making uniforms out of all the nodetree inputs. I will close it for now, but I would still like to see the performance on GWN_shaderinterface_create addressed.

"Fixed" on 2a489273d7e2 by making uniforms out of all the nodetree inputs. I will close it for now, but I would still like to see the performance on GWN_shaderinterface_create addressed.

Ok so I tried to see what causes the huge linking time and the answer is really not pleasing.

This is caused by the number of branching in the shader. So things like 518e768579 which basically double every eevee's branches roughly double the linking time

Using the test program provided by @dfelinto I got theses results :
Current principled shader : 3011ms
Previous principled shader (without the branch) : 1781ms

Most of the shader branching complexity is inside the lamps evaluation. And the addition of the Cascaded shadow map lately increased this complexity.
To reduce the branching I can order lamps by type/shadow and iterate on each type one after the other like I do for the probes.

Reducing branching in lamps code could "hypotheticaly" get us to 1667ms. (I did a really short test bypassing all the "if"s and using the worst case).
But of course that will never be the case and I sense the end result will be more like 2000ms.
And removing the branching from the principle shader gets us to 1020ms. But this would remove the purpose of it's runtime optimisation.

Anyway this would still be higher than 1sec for each Material shader. So we REALLY need lazy shader compilation.

Here are the shader files if someone care to take a shot : eevee-huge.fp eevee-huge.vp eevee-huge-fixed.fp

Ok so I tried to see what causes the huge linking time and the answer is really not pleasing. This is caused by the number of branching in the shader. So things like 518e768579 which basically double every eevee's branches roughly double the linking time Using the test program provided by @dfelinto I got theses results : Current principled shader : 3011ms Previous principled shader (without the branch) : 1781ms Most of the shader branching complexity is inside the lamps evaluation. And the addition of the Cascaded shadow map lately increased this complexity. To reduce the branching I can order lamps by type/shadow and iterate on each type one after the other like I do for the probes. Reducing branching in lamps code could "hypotheticaly" get us to 1667ms. (I did a really short test bypassing all the "if"s and using the worst case). But of course that will never be the case and I sense the end result will be more like 2000ms. And removing the branching from the principle shader gets us to 1020ms. But this would remove the purpose of it's runtime optimisation. Anyway this would still be higher than 1sec for each Material shader. So we **REALLY** need lazy shader compilation. Here are the shader files if someone care to take a shot : [eevee-huge.fp](https://archive.blender.org/developer/F884339/eevee-huge.fp) [eevee-huge.vp](https://archive.blender.org/developer/F884341/eevee-huge.vp) [eevee-huge-fixed.fp](https://archive.blender.org/developer/F884347/eevee-huge-fixed.fp)

Changed status from 'Resolved' to: 'Open'

Changed status from 'Resolved' to: 'Open'
Author
Owner

Confirmed the really low times linking times due to branching. I even added a new --debug-gpu-shaders that for now dumps the compiled shaders in the Blender temp session folder.

Confirmed the really low times linking times due to branching. I even added a new `--debug-gpu-shaders` that for now dumps the compiled shaders in the Blender temp session folder.

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'

Lazy (Deferred) Shader compilation is in 2.8 for quite some time now. Closing...

Lazy (Deferred) Shader compilation is in 2.8 for quite some time now. Closing...
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#51467
No description provided.