Sculpt Mode Stroke Perfomance #80608

Open
opened 2020-09-08 21:42:03 +02:00 by Pablo Dobarro · 9 comments
Member

This task intention is to plan how to tackle the stroke performance issue that sculpt mode has. This is the issue we are trying to solve:

2020-08-08 13-07-14.mp4

The plan is to avoid the slowdown that freeze the UI during the stroke. In any case, if the computer hardware is not enough to keep up with the mesh deformation, the stroke should lag consistently instead of freezing for several seconds.

This should not focus on:

  • Dyntopo performance (should be handled in a separate task as it has some specific performance issues).
  • Making large brush strokes and mesh filters faster.

These are the attempt the were made. There were some successful experiments, but none of them can provide the expected performance sculpt mode should have in a high end computer:

Control paint operator events rate

  • Dynamic Stroke Spacing D8508: This is the solution with the most noticeable effect so far. In order to make this work, high frequency pen tablet events should be available in all platforms. This patch can be beneficial to merge also for not performance reasons as it will also allow to reduce the spacing of some brushes that need it.
  • Paint stroke step queue D5676: This is not related to improve performance (it may even make performance worse), but to improve UX avoiding locking the UI. The idea was to use it for 2D painting were it works fine, but it 3D projection painting and Sculpt Mode it still has some issues.
  • Ignore INVETWEEN_MOUSE_EVENTS in grab brushes (in master since 2.81) D5429: This also had a noticeable effect in the performance of these brushes.

Modify PBVH scheduler settings

  • Change Multires to limit the PBVH leaf size using vertices D8454: This reduces the leaf limit for Multires. It improved performance in high end computers (Julien workstation). Before this patch, Multires was lagging with 80k vertices with 3 subdivisions levels.
  • Change PBVH leaf limit size D8442: This has a noticeable effect in performance. Stroke performance usually improves when making the limit size lower, but this makes tools that modify all nodes slower (like elastic deform or the mesh filters). When increasing the limit size, stroke performance is worse, but tools that modify the whole mesh are faster.
  • PBVH Scheduler performance design task: #72943

Other info

I tested sculpt mode in the following computers, all of them running Ubuntu:

  • Mini-PC media center: i3, 8GB integrated graphics.
  • Gaming Laptop: i7 9750H, GTX 1650, 16GB
  • Workstation: Threadripper 3990X, Radeon Pro VII, 128GB

This is my experience so far

  • Performance does not scale: The workstation and the mini-pc have the same stroke performance and lag problem. In the case of the mini-pc, performance is clearly limited by the GPU as the viewport lags with high poly meshes without doing any strokes.
  • Removing the normal recalculation completely before drawing the PBVH does not have any noticeable effect. If we can find a fix that only works for flat shading (so we can move these updates to somewhere else), it should be acceptable,
  • I did a refactor during the 2.82 development to unify all the loops for the Clay Strips brush as suggested in #68873. It didn't have any noticeable performance effect.
  • In 2.81 the scheduler was changed to TBB. Before that change, Sculpt Mode was noticeable faster disabling multithreaded sculpting. This option is no longer available.
  • This bug fix removed a heavy computation per vertex in clay strips. Even after the fix, the stroke lag is still there
  • Stroke lag is generally higher in Multires than in meshes, when we should expect Multires to be faster than meshes (less cache misses).
  • In the gaming laptop with the EEVEE PBVH drawing enabled, stroke performance is almost the same (if not better) in EEVEE than in workbench. I did't test this with the workstation because I can't guarantee that the new GPU is working correctly. This makes me think that this problem is not related to a viewport drawing issue.
  • The default Sculpt Vertex Colors paint brush does not need normal updates, sampling the area normal or updating the nodes bounding boxes, but it is also affected by the stroke lag issue.
  • The Brush cursor is using a PBVH raycast and a multithreaded task to sample the area normal, but it is not affected by the lag, no matter how fast you move the cursor over the mesh.

Performance reference benchmark

Even this is not an easy to define thing, we should have a fixed performance reference benchmark for sculpting for supported devices. This should allow people to report performance issues with different setups instead of just assuming that performance is bad because Blender is not a especialized software, so we can have more data to debug this issue (the same way if we notice that moving elements in edit mode on a 20 vertices mesh lags, we know it is a bug).

As and initial proposal, I would expect that with a 500k - 750k vertices mesh, Sculpt Mode should never lag regardless of the stroke size or stroke speed, in any computer that meets the minimum requirements to run Blender.

This task intention is to plan how to tackle the stroke performance issue that sculpt mode has. This is the issue we are trying to solve: [2020-08-08 13-07-14.mp4](https://archive.blender.org/developer/F8758865/2020-08-08_13-07-14.mp4) The plan is to avoid the slowdown that freeze the UI during the stroke. In any case, if the computer hardware is not enough to keep up with the mesh deformation, the stroke should lag consistently instead of freezing for several seconds. This should not focus on: - Dyntopo performance (should be handled in a separate task as it has some specific performance issues). - Making large brush strokes and mesh filters faster. These are the attempt the were made. There were some successful experiments, but none of them can provide the expected performance sculpt mode should have in a high end computer: ### Control paint operator events rate - Dynamic Stroke Spacing [D8508](https://archive.blender.org/developer/D8508): This is the solution with the most noticeable effect so far. In order to make this work, high frequency pen tablet events should be available in all platforms. This patch can be beneficial to merge also for not performance reasons as it will also allow to reduce the spacing of some brushes that need it. - Paint stroke step queue [D5676](https://archive.blender.org/developer/D5676): This is not related to improve performance (it may even make performance worse), but to improve UX avoiding locking the UI. The idea was to use it for 2D painting were it works fine, but it 3D projection painting and Sculpt Mode it still has some issues. - Ignore INVETWEEN_MOUSE_EVENTS in grab brushes (in master since 2.81) [D5429](https://archive.blender.org/developer/D5429): This also had a noticeable effect in the performance of these brushes. ### Modify PBVH scheduler settings - Change Multires to limit the PBVH leaf size using vertices [D8454](https://archive.blender.org/developer/D8454): This reduces the leaf limit for Multires. It improved performance in high end computers (Julien workstation). Before this patch, Multires was lagging with 80k vertices with 3 subdivisions levels. - Change PBVH leaf limit size [D8442](https://archive.blender.org/developer/D8442): This has a noticeable effect in performance. Stroke performance usually improves when making the limit size lower, but this makes tools that modify all nodes slower (like elastic deform or the mesh filters). When increasing the limit size, stroke performance is worse, but tools that modify the whole mesh are faster. - PBVH Scheduler performance design task: #72943 ### Other info I tested sculpt mode in the following computers, all of them running Ubuntu: - Mini-PC media center: i3, 8GB integrated graphics. - Gaming Laptop: i7 9750H, GTX 1650, 16GB - Workstation: Threadripper 3990X, Radeon Pro VII, 128GB This is my experience so far - Performance does not scale: The workstation and the mini-pc have the same stroke performance and lag problem. In the case of the mini-pc, performance is clearly limited by the GPU as the viewport lags with high poly meshes without doing any strokes. - Removing the normal recalculation completely before drawing the PBVH does not have any noticeable effect. If we can find a fix that only works for flat shading (so we can move these updates to somewhere else), it should be acceptable, - I did a refactor during the 2.82 development to unify all the loops for the Clay Strips brush as suggested in #68873. It didn't have any noticeable performance effect. - In 2.81 the scheduler was changed to TBB. Before that change, Sculpt Mode was noticeable faster disabling multithreaded sculpting. This option is no longer available. - This bug fix removed a heavy computation per vertex in clay strips. Even after the fix, the stroke lag is still there - Stroke lag is generally higher in Multires than in meshes, when we should expect Multires to be faster than meshes (less cache misses). - In the gaming laptop with the EEVEE PBVH drawing enabled, stroke performance is almost the same (if not better) in EEVEE than in workbench. I did't test this with the workstation because I can't guarantee that the new GPU is working correctly. This makes me think that this problem is not related to a viewport drawing issue. - The default Sculpt Vertex Colors paint brush does not need normal updates, sampling the area normal or updating the nodes bounding boxes, but it is also affected by the stroke lag issue. - The Brush cursor is using a PBVH raycast and a multithreaded task to sample the area normal, but it is not affected by the lag, no matter how fast you move the cursor over the mesh. ### Performance reference benchmark Even this is not an easy to define thing, we should have a fixed performance reference benchmark for sculpting for supported devices. This should allow people to report performance issues with different setups instead of just assuming that performance is bad because Blender is not a especialized software, so we can have more data to debug this issue (the same way if we notice that moving elements in edit mode on a 20 vertices mesh lags, we know it is a bug). As and initial proposal, I would expect that with a 500k - 750k vertices mesh, Sculpt Mode should never lag regardless of the stroke size or stroke speed, in any computer that meets the minimum requirements to run Blender.
Author
Member

Added subscribers: @PabloDobarro, @brecht, @Sergey

Added subscribers: @PabloDobarro, @brecht, @Sergey

Added subscriber: @SirPigeonz

Added subscriber: @SirPigeonz

Added subscriber: @TheRedWaxPolice

Added subscriber: @TheRedWaxPolice

Added subscriber: @tiagoffcruz

Added subscriber: @tiagoffcruz

Added subscriber: @ckohl_art

Added subscriber: @ckohl_art
Member

Added subscriber: @JulienKaspar

Added subscriber: @JulienKaspar
Author
Member

@brecht @Sergey I think I have an idea that may fix this issue, which is making all sculpt mode work at vertex level instead of a PBVH node level. This will require a huge rewrite, but I think I can do it in a reasonable time (except for everything related to drawing).

  • Make the PBVH limit much lower (something between 50 - 100). Consider using an octree instead of a binary tree if traversing the tree with too many nodes takes too much.
  • Make the ##sculpt_pbvh_gather_generic## function traverse the tree and do a distance test per vertex using the search radius. This function will return a list of indices of the vertices that are inside a sphere, which in most cases (except for the brushes that have a square tip), are the ones that are going to be deformed.
  • Push to undo only these vertices. I also have an idea on how to share the undo implementation between meshes and Multires, which will cleanup the undo code a lot.
  • Use TBB directly to iterate over the vertex indices list instead of the ##PBVH_VERTEX_ITER## macro. This way TBB can decide how to split the loop in an optimal way between threads instead of being constrained to a fixed package of 4000 vertices per node. We can also tweak the minimum number of vertices than are going to be processed per task per tool, if necessary.
  • Store and combine the proxies in static arrays allocated with the same size as the affected vertices list.
  • After deforming, update the normals of the affected vertices and their neighbors using the existing iterators.

The only thing I don't know how to do is how to write the PBVH drawing code in a way that can be updated just by knowing exactly which vertices or grids were modified. If this is something that can be done, I think we should at least try this.

I did a quick test just replacing the deforming iterators, and using 1000 as a PBVH limit size (creating more draw batches and testing more vertices than necessary). This is how it looks like:
2020-09-21 18-48-11.mp4

@brecht @Sergey I think I have an idea that may fix this issue, which is making all sculpt mode work at vertex level instead of a PBVH node level. This will require a huge rewrite, but I think I can do it in a reasonable time (except for everything related to drawing). - Make the PBVH limit much lower (something between 50 - 100). Consider using an octree instead of a binary tree if traversing the tree with too many nodes takes too much. - Make the ##sculpt_pbvh_gather_generic## function traverse the tree and do a distance test per vertex using the search radius. This function will return a list of indices of the vertices that are inside a sphere, which in most cases (except for the brushes that have a square tip), are the ones that are going to be deformed. - Push to undo only these vertices. I also have an idea on how to share the undo implementation between meshes and Multires, which will cleanup the undo code a lot. - Use TBB directly to iterate over the vertex indices list instead of the ##PBVH_VERTEX_ITER## macro. This way TBB can decide how to split the loop in an optimal way between threads instead of being constrained to a fixed package of 4000 vertices per node. We can also tweak the minimum number of vertices than are going to be processed per task per tool, if necessary. - Store and combine the proxies in static arrays allocated with the same size as the affected vertices list. - After deforming, update the normals of the affected vertices and their neighbors using the existing iterators. The only thing I don't know how to do is how to write the PBVH drawing code in a way that can be updated just by knowing exactly which vertices or grids were modified. If this is something that can be done, I think we should at least try this. I did a quick test just replacing the deforming iterators, and using 1000 as a PBVH limit size (creating more draw batches and testing more vertices than necessary). This is how it looks like: [2020-09-21 18-48-11.mp4](https://archive.blender.org/developer/F8906708/2020-09-21_18-48-11.mp4)

Added subscriber: @Fux

Added subscriber: @Fux
Julien Kaspar added this to the Sculpt, Paint & Texture project 2023-02-08 10:20:48 +01:00
Philipp Oeser removed the
Interest
Sculpt, Paint & Texture
label 2023-02-10 09:12:20 +01:00
Member

I am removing the Needs Triage label. This is under the general rule that Design and TODO tasks should not have a status.

If you believe this task is no longer relevant, feel free to close it.

I am removing the `Needs Triage` label. This is under the general rule that Design and TODO tasks should not have a status. If you believe this task is no longer relevant, feel free to close it.
Alaska removed the
Status
Needs Triage
label 2024-04-07 05:54:19 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
8 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#80608
No description provided.