Sculpt Mode Stroke Perfomance #80608

New Issue

Pablo Dobarro · 2020-09-08T21:42:03+02:00

Pablo Dobarro commented

2020-09-08 21:42:03 +02:00

This task intention is to plan how to tackle the stroke performance issue that sculpt mode has. This is the issue we are trying to solve:

2020-08-08 13-07-14.mp4

The plan is to avoid the slowdown that freeze the UI during the stroke. In any case, if the computer hardware is not enough to keep up with the mesh deformation, the stroke should lag consistently instead of freezing for several seconds.

This should not focus on:

Dyntopo performance (should be handled in a separate task as it has some specific performance issues).
Making large brush strokes and mesh filters faster.

These are the attempt the were made. There were some successful experiments, but none of them can provide the expected performance sculpt mode should have in a high end computer:

Control paint operator events rate

Dynamic Stroke Spacing D8508: This is the solution with the most noticeable effect so far. In order to make this work, high frequency pen tablet events should be available in all platforms. This patch can be beneficial to merge also for not performance reasons as it will also allow to reduce the spacing of some brushes that need it.
Paint stroke step queue D5676: This is not related to improve performance (it may even make performance worse), but to improve UX avoiding locking the UI. The idea was to use it for 2D painting were it works fine, but it 3D projection painting and Sculpt Mode it still has some issues.
Ignore INVETWEEN_MOUSE_EVENTS in grab brushes (in master since 2.81) D5429: This also had a noticeable effect in the performance of these brushes.

Modify PBVH scheduler settings

Change Multires to limit the PBVH leaf size using vertices D8454: This reduces the leaf limit for Multires. It improved performance in high end computers (Julien workstation). Before this patch, Multires was lagging with 80k vertices with 3 subdivisions levels.
Change PBVH leaf limit size D8442: This has a noticeable effect in performance. Stroke performance usually improves when making the limit size lower, but this makes tools that modify all nodes slower (like elastic deform or the mesh filters). When increasing the limit size, stroke performance is worse, but tools that modify the whole mesh are faster.
PBVH Scheduler performance design task: #72943

Other info

I tested sculpt mode in the following computers, all of them running Ubuntu:

Mini-PC media center: i3, 8GB integrated graphics.
Gaming Laptop: i7 9750H, GTX 1650, 16GB
Workstation: Threadripper 3990X, Radeon Pro VII, 128GB

This is my experience so far

Performance does not scale: The workstation and the mini-pc have the same stroke performance and lag problem. In the case of the mini-pc, performance is clearly limited by the GPU as the viewport lags with high poly meshes without doing any strokes.
Removing the normal recalculation completely before drawing the PBVH does not have any noticeable effect. If we can find a fix that only works for flat shading (so we can move these updates to somewhere else), it should be acceptable,
I did a refactor during the 2.82 development to unify all the loops for the Clay Strips brush as suggested in #68873. It didn't have any noticeable performance effect.
In 2.81 the scheduler was changed to TBB. Before that change, Sculpt Mode was noticeable faster disabling multithreaded sculpting. This option is no longer available.
This bug fix removed a heavy computation per vertex in clay strips. Even after the fix, the stroke lag is still there
Stroke lag is generally higher in Multires than in meshes, when we should expect Multires to be faster than meshes (less cache misses).
In the gaming laptop with the EEVEE PBVH drawing enabled, stroke performance is almost the same (if not better) in EEVEE than in workbench. I did't test this with the workstation because I can't guarantee that the new GPU is working correctly. This makes me think that this problem is not related to a viewport drawing issue.
The default Sculpt Vertex Colors paint brush does not need normal updates, sampling the area normal or updating the nodes bounding boxes, but it is also affected by the stroke lag issue.
The Brush cursor is using a PBVH raycast and a multithreaded task to sample the area normal, but it is not affected by the lag, no matter how fast you move the cursor over the mesh.

Performance reference benchmark

Even this is not an easy to define thing, we should have a fixed performance reference benchmark for sculpting for supported devices. This should allow people to report performance issues with different setups instead of just assuming that performance is bad because Blender is not a especialized software, so we can have more data to debug this issue (the same way if we notice that moving elements in edit mode on a 20 vertices mesh lags, we know it is a bug).

As and initial proposal, I would expect that with a 500k - 750k vertices mesh, Sculpt Mode should never lag regardless of the stroke size or stroke speed, in any computer that meets the minimum requirements to run Blender.

This task intention is to plan how to tackle the stroke performance issue that sculpt mode has. This is the issue we are trying to solve: [2020-08-08 13-07-14.mp4](https://archive.blender.org/developer/F8758865/2020-08-08_13-07-14.mp4) The plan is to avoid the slowdown that freeze the UI during the stroke. In any case, if the computer hardware is not enough to keep up with the mesh deformation, the stroke should lag consistently instead of freezing for several seconds. This should not focus on: - Dyntopo performance (should be handled in a separate task as it has some specific performance issues). - Making large brush strokes and mesh filters faster. These are the attempt the were made. There were some successful experiments, but none of them can provide the expected performance sculpt mode should have in a high end computer: ### Control paint operator events rate - Dynamic Stroke Spacing [D8508](https://archive.blender.org/developer/D8508): This is the solution with the most noticeable effect so far. In order to make this work, high frequency pen tablet events should be available in all platforms. This patch can be beneficial to merge also for not performance reasons as it will also allow to reduce the spacing of some brushes that need it. - Paint stroke step queue [D5676](https://archive.blender.org/developer/D5676): This is not related to improve performance (it may even make performance worse), but to improve UX avoiding locking the UI. The idea was to use it for 2D painting were it works fine, but it 3D projection painting and Sculpt Mode it still has some issues. - Ignore INVETWEEN_MOUSE_EVENTS in grab brushes (in master since 2.81) [D5429](https://archive.blender.org/developer/D5429): This also had a noticeable effect in the performance of these brushes. ### Modify PBVH scheduler settings - Change Multires to limit the PBVH leaf size using vertices [D8454](https://archive.blender.org/developer/D8454): This reduces the leaf limit for Multires. It improved performance in high end computers (Julien workstation). Before this patch, Multires was lagging with 80k vertices with 3 subdivisions levels. - Change PBVH leaf limit size [D8442](https://archive.blender.org/developer/D8442): This has a noticeable effect in performance. Stroke performance usually improves when making the limit size lower, but this makes tools that modify all nodes slower (like elastic deform or the mesh filters). When increasing the limit size, stroke performance is worse, but tools that modify the whole mesh are faster. - PBVH Scheduler performance design task: #72943 ### Other info I tested sculpt mode in the following computers, all of them running Ubuntu: - Mini-PC media center: i3, 8GB integrated graphics. - Gaming Laptop: i7 9750H, GTX 1650, 16GB - Workstation: Threadripper 3990X, Radeon Pro VII, 128GB This is my experience so far - Performance does not scale: The workstation and the mini-pc have the same stroke performance and lag problem. In the case of the mini-pc, performance is clearly limited by the GPU as the viewport lags with high poly meshes without doing any strokes. - Removing the normal recalculation completely before drawing the PBVH does not have any noticeable effect. If we can find a fix that only works for flat shading (so we can move these updates to somewhere else), it should be acceptable, - I did a refactor during the 2.82 development to unify all the loops for the Clay Strips brush as suggested in #68873. It didn't have any noticeable performance effect. - In 2.81 the scheduler was changed to TBB. Before that change, Sculpt Mode was noticeable faster disabling multithreaded sculpting. This option is no longer available. - This bug fix removed a heavy computation per vertex in clay strips. Even after the fix, the stroke lag is still there - Stroke lag is generally higher in Multires than in meshes, when we should expect Multires to be faster than meshes (less cache misses). - In the gaming laptop with the EEVEE PBVH drawing enabled, stroke performance is almost the same (if not better) in EEVEE than in workbench. I did't test this with the workstation because I can't guarantee that the new GPU is working correctly. This makes me think that this problem is not related to a viewport drawing issue. - The default Sculpt Vertex Colors paint brush does not need normal updates, sampling the area normal or updating the nodes bounding boxes, but it is also affected by the stroke lag issue. - The Brush cursor is using a PBVH raycast and a multithreaded task to sample the area normal, but it is not affected by the lag, no matter how fast you move the cursor over the mesh. ### Performance reference benchmark Even this is not an easy to define thing, we should have a fixed performance reference benchmark for sculpting for supported devices. This should allow people to report performance issues with different setups instead of just assuming that performance is bad because Blender is not a especialized software, so we can have more data to debug this issue (the same way if we notice that moving elements in edit mode on a 20 vertices mesh lags, we know it is a bug). As and initial proposal, I would expect that with a 500k - 750k vertices mesh, Sculpt Mode should never lag regardless of the stroke size or stroke speed, in any computer that meets the minimum requirements to run Blender.

Pablo Dobarro commented

2020-09-08 21:42:03 +02:00

Added subscribers: @PabloDobarro, @brecht, @Sergey

Przemyslaw Golab (SirPigeonz) commented

2020-09-09 02:09:02 +02:00

Added subscriber: @SirPigeonz

TheRedWaxPolice commented

2020-09-09 04:48:49 +02:00

Added subscriber: @TheRedWaxPolice

Tiago Cruz commented

2020-09-10 00:18:16 +02:00

Added subscriber: @tiagoffcruz

Chris Kohl commented

2020-09-10 06:14:52 +02:00

Added subscriber: @ckohl_art

Julien Kaspar commented

2020-09-11 12:03:57 +02:00

Added subscriber: @JulienKaspar

Pablo Dobarro commented

2020-09-21 13:30:35 +02:00

@brecht @Sergey I think I have an idea that may fix this issue, which is making all sculpt mode work at vertex level instead of a PBVH node level. This will require a huge rewrite, but I think I can do it in a reasonable time (except for everything related to drawing).

Make the PBVH limit much lower (something between 50 - 100). Consider using an octree instead of a binary tree if traversing the tree with too many nodes takes too much.
Make the ##sculpt_pbvh_gather_generic## function traverse the tree and do a distance test per vertex using the search radius. This function will return a list of indices of the vertices that are inside a sphere, which in most cases (except for the brushes that have a square tip), are the ones that are going to be deformed.
Push to undo only these vertices. I also have an idea on how to share the undo implementation between meshes and Multires, which will cleanup the undo code a lot.
Use TBB directly to iterate over the vertex indices list instead of the ##PBVH_VERTEX_ITER## macro. This way TBB can decide how to split the loop in an optimal way between threads instead of being constrained to a fixed package of 4000 vertices per node. We can also tweak the minimum number of vertices than are going to be processed per task per tool, if necessary.
Store and combine the proxies in static arrays allocated with the same size as the affected vertices list.
After deforming, update the normals of the affected vertices and their neighbors using the existing iterators.

The only thing I don't know how to do is how to write the PBVH drawing code in a way that can be updated just by knowing exactly which vertices or grids were modified. If this is something that can be done, I think we should at least try this.

I did a quick test just replacing the deforming iterators, and using 1000 as a PBVH limit size (creating more draw batches and testing more vertices than necessary). This is how it looks like:
2020-09-21 18-48-11.mp4

@brecht @Sergey I think I have an idea that may fix this issue, which is making all sculpt mode work at vertex level instead of a PBVH node level. This will require a huge rewrite, but I think I can do it in a reasonable time (except for everything related to drawing). - Make the PBVH limit much lower (something between 50 - 100). Consider using an octree instead of a binary tree if traversing the tree with too many nodes takes too much. - Make the ##sculpt_pbvh_gather_generic## function traverse the tree and do a distance test per vertex using the search radius. This function will return a list of indices of the vertices that are inside a sphere, which in most cases (except for the brushes that have a square tip), are the ones that are going to be deformed. - Push to undo only these vertices. I also have an idea on how to share the undo implementation between meshes and Multires, which will cleanup the undo code a lot. - Use TBB directly to iterate over the vertex indices list instead of the ##PBVH_VERTEX_ITER## macro. This way TBB can decide how to split the loop in an optimal way between threads instead of being constrained to a fixed package of 4000 vertices per node. We can also tweak the minimum number of vertices than are going to be processed per task per tool, if necessary. - Store and combine the proxies in static arrays allocated with the same size as the affected vertices list. - After deforming, update the normals of the affected vertices and their neighbors using the existing iterators. The only thing I don't know how to do is how to write the PBVH drawing code in a way that can be updated just by knowing exactly which vertices or grids were modified. If this is something that can be done, I think we should at least try this. I did a quick test just replacing the deforming iterators, and using 1000 as a PBVH limit size (creating more draw batches and testing more vertices than necessary). This is how it looks like: [2020-09-21 18-48-11.mp4](https://archive.blender.org/developer/F8906708/2020-09-21_18-48-11.mp4)

Tobias Fuchsberger commented

2020-09-29 12:16:54 +02:00

Added subscriber: @Fux

Julien Kaspar added this to the Sculpt, Paint & Texture project 2023-02-08 10:20:48 +01:00

Philipp Oeser removed the

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Sculpt Mode Stroke Perfomance #80608

Control paint operator events rate

Modify PBVH scheduler settings

Other info

Performance reference benchmark