Page MenuHome

Sculpt: Avoid reading and writing coordinates at the same time in smooth
Needs ReviewPublic

Authored by Pablo Dobarro (pablodp606) on Jun 2 2020, 8:18 PM.

Details

Summary

This should the correct way of implementing laplacian smoothing. The
coordinates and normals are copied before each iteration to avoid
reading and writing at the same time from the main brush loop.

We need to consider the following before implementing this in the rest
of the tools:

  • If we want to have a 100% correct implementation, the copy needs to be done for the whole mesh as it is not guaranteed that the neighbor vertex to read the coordinate is in the same node that the vertex that is being modified. Filters will need this to be done for the whole mesh.
  • This spawns twice as many threads in the main loop per brush step. In brushes like smooth that can do up to 4 iterations per brush step, this would mean creating threads 8 times in the main brush loop, 1 time to store coordinates to undo, 1 time to recalculate the BB and 2 times to recalculate the normals (+ redrawing). The current performance of smooth is already problematic in mid/high poly meshes and this will make the problem worse.
  • I don't see any noticeable difference in the resulting brush deformation
  • This requires another copy of the mesh coordinates and normals. When sculpting with deform modifiers enabled I think there are already 4 copies of the mesh coordinates in memory.

Diff Detail

Repository
rB Blender
Branch
sculpt-copy-coords (branched from master)
Build Status
Buildable 8335
Build 8335: arc lint + arc unit

Event Timeline

Pablo Dobarro (pablodp606) requested review of this revision.Jun 2 2020, 8:18 PM
Pablo Dobarro (pablodp606) created this revision.

What I would like to see here is a bigger design consideration and planning.

Replying some points you've mentioned in D7508#191313 (in different order though)

The smooth brush was working like this (skipping the proxies and reading/writing at the same time) since 2.79.

Few things here:

  • The Smooth brush implementation comes before 2.79 (just to show the age of the code).
  • I am not aware of any reports about artifacts in this brush (So is it working OK in practice, or people don't report issue?).
  • Theoretically correct implementation will have memory/performance penalty compared to the current implementation where reading and writing happens at the same time (taking rest of the algorithm steps into account, hard to tell in advance how much in percentage penalty of the correct approach would be).
  • In CCG grids sticking/averaging I had to use proper accumulator approach, otherwise it was possible to encounter artifacts (due to access pattern and different threading schedule those things were hammerred more heavily from all the threads)

Taking those points into account, do we want to have proper implementation of smoothing? Is it now time to do so? Is it something for addressing in the future?
Not sure about all that, is something what module owner's and, maybe, The Architect's input would be nice to have.

Now when I have more details it doesn't feel like we should delay D7508: it fixes a real user level issue and is not different from implementation of other similar brushes.

Proxies will affect performance a lot on any brush and filter that requires multiple iterations as the combine proxies task needs to be executed after each iteration to read the correct coordinates in the next one,.

It feels like it should be possible to avoid combining proxies after every iteration.

Is this performance impact measurable after taking into account all other aspects of stroke code? Is this performance hit is big due to not-so-performant implementation of proxies?

which is basically the same issue as in the patch I proposed.

If it's the same, why to add more functionality which does the same thing? Can the existing system be optimized/refined/made better?
Or does it have an intrinsic issues which does require having completely separate implementation of similar functionality? If so, what is it?


Comments about this specific proposal.

the copy needs to be done for the whole mesh

You make it sound as a requirement, I see this as a possible solution. To me the requirement is: "it should be possible to get 'original' coordinates of all neighbors", and "copy entire mesh" is the quickest and easiest to implement solution for this requirement.

Copying an entire mesh is something we shouldn't be doing. If anything, we should be able to query "adjacent" BVH nodes and limit copies as much as possible.

This spawns twice as many threads in the main loop per brush step

I do not see where you're spawning threads. Do you mean that there are twice of parallel-range calls?

Under no circumstances painting code should spawn new threads as this is very expensive. Using parallel iterations does not spawn new threads, but uses existing worker threads. This is way cheaper and due to threaded nature allows to gain better performance.

This requires another copy of the mesh coordinates and normals. When sculpting with deform modifiers enabled I think there are already 4 copies of the mesh coordinates in memory.

Two things here:

  • You can optimize the storage and only store what's really needed (as opposite of full-mesh copy). Or, consolidate copies somehow and hence improve memory situation.
  • The deformed mesh sculpting was never really optimized for memory. It is possible and it should be done, but don't think this is related to the work being discussed here.

I am not aware of any reports about artifacts in this brush (So is it working OK in practice, or people don't report issue?).

I think this is one of those cases where the issues are not being reported and I'm also not particularly convinced with the current smooth tools. The main problem is that their deformation depends completely on mesh resolution. They are hard to control in low poly models and its effect requires a huge amount of iterations to smooth surfaces on high poly meshes, which is something we should try to improve.

Taking those points into account, do we want to have proper implementation of smoothing? Is it now time to do so? Is it something for addressing in the future?

There is currently no other tool to smooth volumes of high poly meshes other than using laplacian smoothing with a huge number of iterations (in brush or filter version), so that is why I think that we should be careful when doing changes that may make the current smooth tools slower. If we could implement a different smooth algorithm to cover this use case then it should be safer to have the correct laplacian smooth with the extra performance penalty.

Copying an entire mesh is something we shouldn't be doing. If anything, we should be able to query "adjacent" BVH nodes and limit copies as much as possible.

True. Querying adjacent nodes is also something I would like to have in the future as it can also be used to improve performance in other operations (anything that propagates data using the mesh topology)

I do not see where you're spawning threads. Do you mean that there are twice of parallel-range calls?

Yes. Basically, when coding new tools and brushes I always try to minimize the use of BLI_task_parallel_range as (at least in my computer), each time a loop over nodes is multithreaded performance becomes unstable and brushes/tools are less responsive (more info in T72943). This is what concerns me the most of the correct smooth implementation, not memory usage.
I made some experiments with the smooth brush in the past. Doing something like this P1447 (with the strength correctly implemented) makes the smooth brush much faster and responsive, but the implementation is even more incorrect.

Is this performance impact measurable after taking into account all other aspects of stroke code? Is this performance hit is big due to not-so-performant implementation of proxies?

Same as above. Using proxies requires an extra loop to combine them, which is what I think it is causing the problems (at least I can't think of any correct implementation of smoothing without adding an extra loop). There is also this task about consolidating the loops T68873. I don't think the performance problem is with the proxies implementation or usage itself.

Overall I'm not sure it's worth making changes here unless significant real-world problems are found with the current code.

Since smoothing uses multiple iterations, the original coordinates from the undo stack can't be reused (they're only relevant for the first iteration). Previous coordinates should definitely only be stored per modified node, although of course in some cases that still means a lot of memory usage (like smoothing the entire mesh).

It's possible in principle to be smarter and store only previous coords for vertices that neighbor another node, or process nodes in a particular order so such memory can be freed more quickly. But it all seems quite messy and probably not worth it.

Proxies could be optimized or removed, at the cost of more code for every brush to handle symmetry. Basically changing:

foreach symmetry:
   foreach node

to

foreach node:
  foreach symmetry:

or

foreach node:
  foreach vertex:
    foreach symmetry

That would be a big change though.