Bump map node is doing something strange which is causing unusable viewport performance. #83624

Closed
opened 2020-12-10 12:20:27 +01:00 by michael campbell · 24 comments

System Information
Operating system: Windows-10-10.0.18362-SP0 64 Bits
Graphics card: GeForce GTX 1070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 456.38

Blender Version
Broken: version: 2.92.0 Alpha, branch: master, commit date: 2020-11-30 18:56, hash: 007a0e43a0
Worked: (newest version of Blender that worked as expected)

Short description of error
Bump map node is doing something strange which is causing unusable viewport performance.

Grab the below file and watch the video to see how to reproduce.

bump map eevee performance issue.blend

bump problem.mp4

**System Information** Operating system: Windows-10-10.0.18362-SP0 64 Bits Graphics card: GeForce GTX 1070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 456.38 **Blender Version** Broken: version: 2.92.0 Alpha, branch: master, commit date: 2020-11-30 18:56, hash: `007a0e43a0` Worked: (newest version of Blender that worked as expected) **Short description of error** Bump map node is doing something strange which is causing unusable viewport performance. Grab the below file and watch the video to see how to reproduce. [bump map eevee performance issue.blend](https://archive.blender.org/developer/F9499985/bump_map_eevee_performance_issue.blend) [bump problem.mp4](https://archive.blender.org/developer/F9499986/bump_problem.mp4)

Added subscriber: @3di

Added subscriber: @3di

Added subscriber: @deadpin

Added subscriber: @deadpin

It's not necessarily the bump per-se, it's that the final shader is too complex. If you unconnect the last Noise node at the very bottom you'll see "good" behavior. The additional Noise node is what causes the issue in this particular graph and has popped up a few times in the past when new features were added to it (it used to not work at all; and might still not work if pushed further with more nodes). The old issue being #69776 and its duplicates.

It's not necessarily the bump per-se, it's that the final shader is too complex. If you unconnect the last Noise node at the very bottom you'll see "good" behavior. The additional Noise node is what causes the issue in this particular graph and has popped up a few times in the past when new features were added to it (it used to not work at all; and might still not work if pushed further with more nodes). The old issue being #69776 and its duplicates.

That's what I thought, but then:

  • why does zooming out fix the issue if the problem is a too complex node tree. What is happening differently when zoomed out? It's not related to amount of detail the viewport is trying to draw, because the single higher frequency noise didn't cause an issue.
  • why doesn't an even greater number of noise nodes cause a problem if they're connected to the base input rather than a bump node
That's what I thought, but then: - why does zooming out fix the issue if the problem is a too complex node tree. What is happening differently when zoomed out? It's not related to amount of detail the viewport is trying to draw, because the single higher frequency noise didn't cause an issue. - why doesn't an even greater number of noise nodes cause a problem if they're connected to the base input rather than a bump node

It's a pixel shader at the end of the day so the fewer pixels that need shading (when zoomed out) the better it'll be (I can get increased performance by keeping the zoom level and just shrinking the viewport even more etc.; remember that this shader math runs per-pixel effectively). Bump is also a rather complex node so there's any number of ways you can stray over the limit.

It's a pixel shader at the end of the day so the fewer pixels that need shading (when zoomed out) the better it'll be (I can get increased performance by keeping the zoom level and just shrinking the viewport even more etc.; remember that this shader math runs per-pixel effectively). Bump is also a rather complex node so there's any number of ways you can stray over the limit.

I still think there's something that doesn't make sense. It seems that the bump node is forcing all downstream nodes to recalculate each draw unnecessarily, at least that's the only reason I can think of for the below:

This setup plays back in the viewport at 3 fps

image.png

But the below node setup which contains the same number of nodes plays back at over 30 fps. The only difference is that the bump node only has one noise node downstream, and the others are going into the base colour. Why do the noise nodes need to be recalculated each draw if they're plugged into the bump node, but not when they're plugged into the base colour? Or if I'm wrong and they're not being recalculated, then why would the bump node perform slower with 4 noise nodes connected to it than it does with 1 noise node connected to it, even if the output of 1 noise node is more detailed than the output of the 4 combined noise nodes.....baring in mind that we know the noise nodes are not the bottleneck, because we've tested how quickly they calculate when connected to the base colour.

image.png

I still think there's something that doesn't make sense. It seems that the bump node is forcing all downstream nodes to recalculate each draw unnecessarily, at least that's the only reason I can think of for the below: This setup plays back in the viewport at 3 fps ![image.png](https://archive.blender.org/developer/F9500901/image.png) But the below node setup which contains the same number of nodes plays back at over 30 fps. The only difference is that the bump node only has one noise node downstream, and the others are going into the base colour. Why do the noise nodes need to be recalculated each draw if they're plugged into the bump node, but not when they're plugged into the base colour? Or if I'm wrong and they're not being recalculated, then why would the bump node perform slower with 4 noise nodes connected to it than it does with 1 noise node connected to it, even if the output of 1 noise node is more detailed than the output of the 4 combined noise nodes.....baring in mind that we know the noise nodes are not the bottleneck, because we've tested how quickly they calculate when connected to the base colour. ![image.png](https://archive.blender.org/developer/F9500905/image.png)

Added subscriber: @mano-wii

Added subscriber: @mano-wii

Unfortunately I'm having problems with drivers and I can't investigate this slowdown for now.
This seems to be the same problem as #79522 (Eevee: Procedural material very slow), but a change from 3 to 30 fps because of the bump does not seem normal.
Bump node works at a higher resolution. So it is expected to be 3 or 4 times slower than the color slot.

Unfortunately I'm having problems with drivers and I can't investigate this slowdown for now. This seems to be the same problem as #79522 (Eevee: Procedural material very slow), but a change from 3 to 30 fps because of the bump does not seem normal. Bump node works at a higher resolution. So it is expected to be 3 or 4 times slower than the color slot.

Thanks. I can appreciate that the bump node may be 3 to 4 times slower than the colour slot, but why would it be slower with 4 noise nodes connected to it than it is with 1 noise node connected to it. Wouldn't the result of 4 combined noise nodes result in the same amount of data as 1 noise node?

Also, when the noise nodes are connected to the bump node, shouldn't they calculate at the same speed they do when they're connected to the base colour? Or perhaps the bump node is forcing all noise nodes to be recalculated more often than they do when they're connected to the base colour?

Thanks. I can appreciate that the bump node may be 3 to 4 times slower than the colour slot, but why would it be slower with 4 noise nodes connected to it than it is with 1 noise node connected to it. Wouldn't the result of 4 combined noise nodes result in the same amount of data as 1 noise node? Also, when the noise nodes are connected to the bump node, shouldn't they calculate at the same speed they do when they're connected to the base colour? Or perhaps the bump node is forcing all noise nodes to be recalculated more often than they do when they're connected to the base colour?

In #83624#1073220, @3di wrote:
Why would it be slower with 4 noise nodes connected to it than it is with 1 noise node connected to it. Wouldn't the result of 4 combined noise nodes result in the same amount of data as 1 noise node?

No,( and I'm not even sure how this could be implemented in any software).
As a general rule, less nodes, less complex shader, faster.

Also, when the noise nodes are connected to the bump node, shouldn't they calculate at the same speed they do when they're connected to the base colour? Or perhaps the bump node is forcing all noise nodes to be recalculated more often than they do when they're connected to the base colour?

As mentioned, the node is being calculated for an internal texture with higher resolution, if I'm not mistaken containing 4 times more pixels.
I don't know if Eevee uses the result of this texture to optimize the color slot, but that doesn't exempt the fact that it is slower.

> In #83624#1073220, @3di wrote: > Why would it be slower with 4 noise nodes connected to it than it is with 1 noise node connected to it. Wouldn't the result of 4 combined noise nodes result in the same amount of data as 1 noise node? No,( and I'm not even sure how this could be implemented in any software). As a general rule, less nodes, less complex shader, faster. > Also, when the noise nodes are connected to the bump node, shouldn't they calculate at the same speed they do when they're connected to the base colour? Or perhaps the bump node is forcing all noise nodes to be recalculated more often than they do when they're connected to the base colour? As mentioned, the node is being calculated for an internal texture with **higher** resolution, if I'm not mistaken containing 4 times more pixels. I don't know if Eevee uses the result of this texture to optimize the color slot, but that doesn't exempt the fact that it is slower.

hmm, but the 4 identical noise nodes will calculate at the same speed regardless of whether they're going to the bump node or the base colour, and in memory 4 noise nodes added together will still result in the same amount of pixel information as 1 noise node (both 4x the screen resolution internally). So with this in mind the bump node is receiving the same amount of data from 1 noise node as it is from 4 combined noise nodes, and therefore shouldn't be slower with 1 noise connected to it than it is with 4 noise nodes connected to it, because there are still 4 noise nodes being calculated in the entire shader regardless of whether 3 are going to the base colour and 1 to the bump, or 4 to the bump and 0 to the base colour.

hmm, but the 4 identical noise nodes will calculate at the same speed regardless of whether they're going to the bump node or the base colour, and in memory 4 noise nodes added together will still result in the same amount of pixel information as 1 noise node (both 4x the screen resolution internally). So with this in mind the bump node is receiving the same amount of data from 1 noise node as it is from 4 combined noise nodes, and therefore shouldn't be slower with 1 noise connected to it than it is with 4 noise nodes connected to it, because there are still 4 noise nodes being calculated in the entire shader regardless of whether 3 are going to the base colour and 1 to the bump, or 4 to the bump and 0 to the base colour.

Added subscriber: @Stan_Pancakes

Added subscriber: @Stan_Pancakes

@3di that's not the way it works. This isn't about the amount of data (it can be, but not in this case), it's about the size of resultant shader (i.e. code size), and complexity of computation. This is the real downside of visual programming with black boxes (which is what a node-based shader constructor is) - very easy to dramatically increase complexity. It's not at all WYSIWYG. The whole tree that's connected to the Height input of the Bump node is duplicated, twice, under the hood. So if you have one node feeding it, you really have three. If you have 11 like in your example file, you really have 33. Each noise node evaluates noise not once, but per octave (Detail setting), and each evaluation is quite costly. With settings from your example, the four noise nodes would run these expensive computations for a total of 2+7+2+2 = 13 times. Or 39 total taking duplication in mind. All the Add nodes introduce dependencies in the overall shader computation, reducing parallelization in the GPU (depends on hardware and driver). There is only so much code you can feed to the GPU before it can no longer keep up.
You can try this yourself: unplug the bump, group the nodes feeding it, add two more instances, Add them together using a mix node and feed result into base color. You will get a comparable slowdown (and that's before considering that the Bump node itself, which would be skipped entirely in this configuration, has its own overhead). Reduce Detail level on each noise node to 1, and you'll notice significant improvement in performance, even in the bloated graph.

To sum it up: adding a tree branch to a Bump input adds in triplicate; Noise nodes are expensive, each made more so by the Detail setting.

@3di that's not the way it works. This isn't about the amount of data (it can be, but not in this case), it's about the size of resultant shader (i.e. code size), and complexity of computation. This is the real downside of visual programming with black boxes (which is what a node-based shader constructor is) - very easy to dramatically increase complexity. It's not at all WYSIWYG. The whole tree that's connected to the Height input of the Bump node is duplicated, twice, under the hood. So if you have one node feeding it, you really have three. If you have 11 like in your example file, you really have 33. Each noise node evaluates noise not once, but per octave (Detail setting), and each evaluation is quite costly. With settings from your example, the four noise nodes would run these expensive computations for a total of 2+7+2+2 = 13 times. Or 39 total taking duplication in mind. All the Add nodes introduce dependencies in the overall shader computation, reducing parallelization in the GPU (depends on hardware and driver). There is only so much code you can feed to the GPU before it can no longer keep up. You can try this yourself: unplug the bump, group the nodes feeding it, add two more instances, Add them together using a mix node and feed result into base color. You will get a comparable slowdown (and that's before considering that the Bump node itself, which would be skipped entirely in this configuration, has its own overhead). Reduce Detail level on each noise node to 1, and you'll notice significant improvement in performance, even in the bloated graph. To sum it up: adding a tree branch to a Bump input adds in triplicate; Noise nodes are expensive, each made more so by the Detail setting.

Thanks. Why does it only triple the node tree if it’s connected to the bump and not the base colour?

Thanks. Why does it only triple the node tree if it’s connected to the bump and not the base colour?

Because the Bump needs that data :)

Because the Bump needs that data :)

So let’s say you have one noise node, which gets duplicated twice because that’s what the bump nose needs apparently. For an individual pixel that’s being calculated, what is the difference between the output of the original tree, the second copy and the third copy?

Could the tree be executed once, and then the result of the two additional iterations be calculated from that instead of having to execute the identical node tree twice more? Presumably there’s a relationship between the difference considering the nodes are identical.

Could you point me to any documentation which explains the purpose for this tripling up of full branches for bump nodes. Not necessarily blender specific.

So let’s say you have one noise node, which gets duplicated twice because that’s what the bump nose needs apparently. For an individual pixel that’s being calculated, what is the difference between the output of the original tree, the second copy and the third copy? Could the tree be executed once, and then the result of the two additional iterations be calculated from that instead of having to execute the identical node tree twice more? Presumably there’s a relationship between the difference considering the nodes are identical. Could you point me to any documentation which explains the purpose for this tripling up of full branches for bump nodes. Not necessarily blender specific.
Member

Added subscriber: @EAW

Added subscriber: @EAW
Member

Changed status from 'Needs Triage' to: 'Archived'

Changed status from 'Needs Triage' to: 'Archived'
Member

Let's take a trip, way back to 2.80, when shadows were softer, and per light, and bump was aliased.

On the left 2.80, and on the right 2.92.
2.80_vs_2.92_bump.png

In fact, the manual suggested to render out at twice the needed resolution, and downscale the result, if you used a bump node in EEVEE.

Bump Mapping
As of now, bump mapping is supported using OpenGL derivatives which are the same for each block of 2×2 pixels. This means the bump output value will appear pixelated. It is recommended to use normal mapping instead.

Tip:
If you absolutely need to render using Bump nodes, render at twice the target resolution and downscale the final output.

(In fact, the manual still says that. I have been meaning to submit a update, but it got lost in my personal TODO pile.)

Could you point me to any documentation which explains the purpose for this tripling up of full branches for bump nodes. Not necessarily blender specific.

Sure. Here is a start:
ffd5e1e6ac

Edit: Another article just showing a GLSL implementation of the derivative calculation and how it is calculated on 2x2 pixel blocks. If you want to get the derivative down to the individual pixel size, then you have to manually calculate it by duplicating your texture twice, offsetting one of the duplicates a pixel in the x direction, and the other along the y.

I'll be closing this task as it is as the code is working as designed, and therefor doesn't meet this trackers definition of a bug.

Let's take a trip, way back to 2.80, ~~when shadows were softer, and per light,~~ and bump was aliased. On the left 2.80, and on the right 2.92. ![2.80_vs_2.92_bump.png](https://archive.blender.org/developer/F9503708/2.80_vs_2.92_bump.png) In fact, [the manual suggested to render out at twice the needed resolution,](https://docs.blender.org/manual/en/dev/render/eevee/limitations.html#materials) and downscale the result, if you used a bump node in EEVEE. >**Bump Mapping** >As of now, bump mapping is supported using OpenGL derivatives which are the same for each block of 2×2 pixels. This means the bump output value will appear pixelated. It is recommended to use normal mapping instead. > > Tip: > If you absolutely need to render using Bump nodes, render at twice the target resolution and downscale the final output. (In fact, the manual still says that. I have been meaning to submit a update, but it got lost in my personal TODO pile.) >Could you point me to any documentation which explains the purpose for this tripling up of full branches for bump nodes. Not necessarily blender specific. Sure. Here is a start: ffd5e1e6ac Edit: [Another article](http://www.aclockworkberry.com/shader-derivative-functions/) just showing a GLSL implementation of the derivative calculation and how it is calculated on 2x2 pixel blocks. If you want to get the derivative down to the individual pixel size, then you have to manually calculate it by duplicating your texture twice, offsetting one of the duplicates a pixel in the x direction, and the other along the y. I'll be closing this task as it is as the code is working as designed, and therefor doesn't meet this trackers definition of a bug.

So it's duplicating and recalculating entire node branches just so it can add a 1 pixel offset in the x and y directions?

Would it not be better to just render an additional 1 pixel outside of the screen and then perform a translation on the data from the first branch's execution, rather than recalculating the entire branch 3 times to get identical information moved slightly in two different directcions?

Where would be the best place to move this conversation to in order to get a dev to look at it? Seems a shame to cast it off into oblivion when there appears to be an approach which will give a 3x performance increase for materials that use bump in EEVEE.

So it's duplicating and recalculating entire node branches just so it can add a 1 pixel offset in the x and y directions? Would it not be better to just render an additional 1 pixel outside of the screen and then perform a translation on the data from the first branch's execution, rather than recalculating the entire branch 3 times to get identical information moved slightly in two different directcions? Where would be the best place to move this conversation to in order to get a dev to look at it? Seems a shame to cast it off into oblivion when there appears to be an approach which will give a 3x performance increase for materials that use bump in EEVEE.

ignore that, I just realised it's not just a case of moving the screen pixels, it's a case of moving them along the surface of the object, and the out of view pixels arent there to be moved because it's a screenspace renderer.

ignore that, I just realised it's not just a case of moving the screen pixels, it's a case of moving them along the surface of the object, and the out of view pixels arent there to be moved because it's a screenspace renderer.

Perhaps it could use one iteration of the branch, and render one additional pixel for all borders, and then move the result in uv space instead of executing the node branch 3 times?

Perhaps it could use one iteration of the branch, and render one additional pixel for all borders, and then move the result in uv space instead of executing the node branch 3 times?

Removed subscriber: @Stan_Pancakes

Removed subscriber: @Stan_Pancakes
Member

Closed as duplicate of #79522

Closed as duplicate of #79522
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#83624
No description provided.