Page MenuHome

Sculpt: Change default min_iter_per_thread
AbandonedPublic

Authored by Pablo Dobarro (pablodp606) on Sat, Oct 5, 4:55 PM.
Tags
None
Subscribers
Tokens
"Love" token, awarded by roman13."Yellow Medal" token, awarded by franMarz."Love" token, awarded by johnsyed."Love" token, awarded by tiagoffcruz."Love" token, awarded by lopoIsaac."Love" token, awarded by Brandon777."Burninate" token, awarded by undo."100" token, awarded by Frozen_Death_Knight.

Details

Summary

I'm getting this performance improvement by changing the min_iter_per_thread value. Sometimes it lags a little bit more during the first steps of the stroke.
Before/After

Diff Detail

Repository
rB Blender
Branch
pbvh-task-defaults-1 (branched from master)
Build Status
Buildable 5252
Build 5252: arc lint + arc unit

Event Timeline

I'm getting a similar performance improvement if I disable multi-threading in the brush code. Could this also be related to the number of mouse events Linux is sending?

When I changed the defaults here I found 1 to be clearly faster than higher values. I imagine it depends on the CPU, operating system and the operation being performed. I tested on machines with both 4 and 32 cores.

Setting this to 16 may effectively be disabling multithreading for the operation your are doing (probably the brush doesn't touch more than 16 nodes every step?).

How many cores does your computer have? Is this on Linux? Can you try leaving this to 1 and running Blender with e.g. half the number of cores, or a quarter, and see if that's faster still?

What may be happening is that the thread scheduling isn't working well. If a core gets switched to another thread or process in the middle of executing work, it can delay everything. If other cores are e.g. stuck on a spinlock they may not get scheduled to help finish that work.

Ideally we'd solve that, since disabling threading leaves a lot of potential performance on the table.

It is a i7 6700K on Manjaro Linux. If I use all cores performance is terrible. Using half or a quarter of the cores is much faster, but not as fast as having threaded sculpt disabled.
I also tested this in 2.79. Enabling/disabling threaded sculpt there does not have this performance impact (it is almost the same)

We can revert to the previous scheduling (basically just remove the two lines of code that set settings->min_iter_per_thread and settings->scheduling_mode). It only gave a 10% speedup here so it's not that big a problem. Probably that makes it behave similar to 2.79.

However it still effectively disables multithreading in many cases where we might be able to benefit from it.

Here's a test using TBB for multithreading instead. It can help figure out if the problem is in our task scheduler code. Or if perhaps.it's just not possible to use threading efficiently for many small sculpt steps as we are doing now.

1commit eabc6f0678ffd088944e98d2b695aeed94a832f7
2Author: Brecht Van Lommel <brechtvanlommel@gmail.com>
3Date: Sun Oct 6 12:10:39 2019 +0200
4
5 Sculpt: experiment using TBB instead of BLI_task
6
7diff --git a/source/blender/blenkernel/BKE_pbvh.h b/source/blender/blenkernel/BKE_pbvh.h
8index dedf76ee839..44868236938 100644
9--- a/source/blender/blenkernel/BKE_pbvh.h
10+++ b/source/blender/blenkernel/BKE_pbvh.h
11@@ -28,6 +28,10 @@
12 /* For embedding CCGKey in iterator. */
13 #include "BKE_ccg.h"
14
15+#ifdef __cplusplus
16+extern "C" {
17+#endif
18+
19 struct BMLog;
20 struct BMesh;
21 struct CCGElem;
22@@ -44,6 +48,7 @@ struct PBVH;
23 struct PBVHNode;
24 struct SubdivCCG;
25 struct TaskParallelSettings;
26+struct TaskParallelTLS;
27
28 typedef struct PBVH PBVH;
29 typedef struct PBVHNode PBVHNode;
30@@ -430,14 +435,37 @@ void BKE_pbvh_node_get_bm_orco_data(PBVHNode *node,
31
32 bool BKE_pbvh_node_vert_update_check_any(PBVH *bvh, PBVHNode *node);
33
34-void BKE_pbvh_parallel_range_settings(struct TaskParallelSettings *settings,
35- bool use_threading,
36- int totnode);
37-
38 // void BKE_pbvh_node_BB_reset(PBVHNode *node);
39 // void BKE_pbvh_node_BB_expand(PBVHNode *node, float co[3]);
40
41 bool pbvh_has_mask(PBVH *bvh);
42 void pbvh_show_mask_set(PBVH *bvh, bool show_mask);
43
44+/* Parallelization */
45+typedef void (*PBVHParallelReduceFunc)(void *__restrict userdata, void *__restrict userdata_chunk);
46+
47+typedef void (*PBVHParallelRangeFunc)(void *__restrict userdata,
48+ const int iter,
49+ const struct TaskParallelTLS *__restrict tls);
50+
51+typedef struct PBVHParallelSettings {
52+ void *userdata_chunk;
53+ size_t userdata_chunk_size;
54+ PBVHParallelReduceFunc func_reduce;
55+} PBVHParallelSettings;
56+
57+void BKE_pbvh_parallel_range_settings(struct PBVHParallelSettings *settings,
58+ bool use_threading,
59+ int totnode);
60+
61+void BKE_pbvh_parallel_range(const int start,
62+ const int stop,
63+ void *userdata,
64+ PBVHParallelRangeFunc func,
65+ const struct PBVHParallelSettings *settings);
66+
67+#ifdef __cplusplus
68+}
69+#endif
70+
71 #endif /* __BKE_PBVH_H__ */
72diff --git a/source/blender/blenkernel/CMakeLists.txt b/source/blender/blenkernel/CMakeLists.txt
73index 47b44c3828a..1525a3e7661 100644
74--- a/source/blender/blenkernel/CMakeLists.txt
75+++ b/source/blender/blenkernel/CMakeLists.txt
76@@ -58,6 +58,7 @@ set(INC
77 set(INC_SYS
78 ${GLEW_INCLUDE_PATH}
79 ${ZLIB_INCLUDE_DIRS}
80+ ${TBB_INCLUDE_DIRS}
81 )
82
83 set(SRC
84@@ -183,6 +184,7 @@ set(SRC
85 intern/particle_system.c
86 intern/pbvh.c
87 intern/pbvh_bmesh.c
88+ intern/pbvh_parallel.cpp
89 intern/pointcache.c
90 intern/report.c
91 intern/rigidbody.c
92diff --git a/source/blender/blenkernel/intern/pbvh.c b/source/blender/blenkernel/intern/pbvh.c
93index 7ed986204d5..e9bf7893cda 100644
94--- a/source/blender/blenkernel/intern/pbvh.c
95+++ b/source/blender/blenkernel/intern/pbvh.c
96@@ -1093,12 +1093,11 @@ static void pbvh_faces_update_normals(PBVH *bvh, PBVHNode **nodes, int totnode)
97 .vnors = vnors,
98 };
99
100- TaskParallelSettings settings;
101+ PBVHParallelSettings settings;
102 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
103
104- BLI_task_parallel_range(0, totnode, &data, pbvh_update_normals_accum_task_cb, &settings);
105-
106- BLI_task_parallel_range(0, totnode, &data, pbvh_update_normals_store_task_cb, &settings);
107+ BKE_pbvh_parallel_range(0, totnode, &data, pbvh_update_normals_accum_task_cb, &settings);
108+ BKE_pbvh_parallel_range(0, totnode, &data, pbvh_update_normals_store_task_cb, &settings);
109
110 MEM_freeN(vnors);
111 }
112@@ -1148,9 +1147,9 @@ static void pbvh_update_mask_redraw(PBVH *bvh, PBVHNode **nodes, int totnode, in
113 .flag = flag,
114 };
115
116- TaskParallelSettings settings;
117+ PBVHParallelSettings settings;
118 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
119- BLI_task_parallel_range(0, totnode, &data, pbvh_update_mask_redraw_task_cb, &settings);
120+ BKE_pbvh_parallel_range(0, totnode, &data, pbvh_update_mask_redraw_task_cb, &settings);
121 }
122
123 static void pbvh_update_BB_redraw_task_cb(void *__restrict userdata,
124@@ -1186,9 +1185,9 @@ void pbvh_update_BB_redraw(PBVH *bvh, PBVHNode **nodes, int totnode, int flag)
125 .flag = flag,
126 };
127
128- TaskParallelSettings settings;
129+ PBVHParallelSettings settings;
130 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
131- BLI_task_parallel_range(0, totnode, &data, pbvh_update_BB_redraw_task_cb, &settings);
132+ BKE_pbvh_parallel_range(0, totnode, &data, pbvh_update_BB_redraw_task_cb, &settings);
133 }
134
135 static int pbvh_get_buffers_update_flags(PBVH *bvh, bool show_vcol)
136@@ -1295,9 +1294,9 @@ static void pbvh_update_draw_buffers(
137 .show_vcol = show_vcol,
138 };
139
140- TaskParallelSettings settings;
141+ PBVHParallelSettings settings;
142 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
143- BLI_task_parallel_range(0, totnode, &data, pbvh_update_draw_buffer_cb, &settings);
144+ BKE_pbvh_parallel_range(0, totnode, &data, pbvh_update_draw_buffer_cb, &settings);
145 }
146
147 static int pbvh_flush_bb(PBVH *bvh, PBVHNode *node, int flag)
148@@ -2712,13 +2711,10 @@ void pbvh_show_mask_set(PBVH *bvh, bool show_mask)
149 bvh->show_mask = show_mask;
150 }
151
152-void BKE_pbvh_parallel_range_settings(TaskParallelSettings *settings,
153- bool use_threading,
154- int totnode)
155+void BKE_pbvh_parallel_range_settings(PBVHParallelSettings *settings,
156+ /* TODO: remove or use */
157+ bool UNUSED(use_threading),
158+ int UNUSED(totnode))
159 {
160- const int threaded_limit = 1;
161- BLI_parallel_range_settings_defaults(settings);
162- settings->use_threading = use_threading && (totnode > threaded_limit);
163- settings->min_iter_per_thread = 1;
164- settings->scheduling_mode = TASK_SCHEDULING_DYNAMIC;
165+ memset(settings, 0, sizeof(*settings));
166 }
167diff --git a/source/blender/blenkernel/intern/pbvh_parallel.cpp b/source/blender/blenkernel/intern/pbvh_parallel.cpp
168new file mode 100644
169index 00000000000..7500b791ef2
170--- /dev/null
171+++ b/source/blender/blenkernel/intern/pbvh_parallel.cpp
172@@ -0,0 +1,112 @@
173+/*
174+ * This program is free software; you can redistribute it and/or
175+ * modify it under the terms of the GNU General Public License
176+ * as published by the Free Software Foundation; either version 2
177+ * of the License, or (at your option) any later version.
178+ *
179+ * This program is distributed in the hope that it will be useful,
180+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
181+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
182+ * GNU General Public License for more details.
183+ *
184+ * You should have received a copy of the GNU General Public License
185+ * along with this program; if not, write to the Free Software Foundation,
186+ * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
187+ */
188+
189+#include "MEM_guardedalloc.h"
190+
191+#include "BLI_task.h"
192+
193+#include "BKE_pbvh.h"
194+
195+#include <tbb/tbb.h>
196+
197+struct PBVHReduceTask {
198+ PBVHParallelRangeFunc func;
199+ void *userdata;
200+
201+ PBVHParallelReduceFunc func_reduce;
202+ void *userdata_chunk;
203+ size_t userdata_chunk_size;
204+ bool userdata_chunk_free;
205+
206+ PBVHReduceTask()
207+ {
208+ }
209+
210+ PBVHReduceTask(PBVHReduceTask &other, tbb::split)
211+ : func(other.func),
212+ userdata(other.userdata),
213+ func_reduce(other.func_reduce),
214+ userdata_chunk_size(other.userdata_chunk_size),
215+ userdata_chunk_free(true)
216+ {
217+ /* TODO: is there significant malloc overhead? */
218+ userdata_chunk = MEM_mallocN(userdata_chunk_size, "PBVHReduceTask");
219+ memcpy(userdata_chunk, other.userdata_chunk, userdata_chunk_size);
220+ }
221+
222+ ~PBVHReduceTask()
223+ {
224+ if (userdata_chunk_free) {
225+ MEM_freeN(userdata_chunk);
226+ }
227+ }
228+
229+ void operator()(const tbb::blocked_range<int> &r) const
230+ {
231+ TaskParallelTLS tls;
232+ tls.thread_id = 0; /* TODO */
233+ tls.userdata_chunk = userdata_chunk;
234+ for (int i = r.begin(); i != r.end(); ++i) {
235+ func(userdata, i, &tls);
236+ }
237+ }
238+
239+ void join(const PBVHReduceTask &other)
240+ {
241+ func_reduce(userdata_chunk, other.userdata_chunk);
242+ }
243+};
244+
245+struct PBVHTask {
246+ PBVHParallelRangeFunc func;
247+ void *userdata;
248+
249+ void operator()(const tbb::blocked_range<int> &r) const
250+ {
251+ TaskParallelTLS tls;
252+ tls.thread_id = 0; /* TODO */
253+ tls.userdata_chunk = NULL;
254+ for (int i = r.begin(); i != r.end(); ++i) {
255+ func(userdata, i, &tls);
256+ }
257+ }
258+};
259+
260+void BKE_pbvh_parallel_range(const int start,
261+ const int stop,
262+ void *userdata,
263+ PBVHParallelRangeFunc func,
264+ const struct PBVHParallelSettings *settings)
265+{
266+ if (settings->func_reduce) {
267+ PBVHReduceTask task;
268+ task.func = func;
269+ task.userdata = userdata;
270+
271+ task.func_reduce = settings->func_reduce;
272+ task.userdata_chunk = settings->userdata_chunk;
273+ task.userdata_chunk_size = settings->userdata_chunk_size;
274+ task.userdata_chunk_free = false;
275+
276+ parallel_for(tbb::blocked_range<int>(start, stop), task);
277+ }
278+ else {
279+ PBVHTask task;
280+ task.func = func;
281+ task.userdata = userdata;
282+ parallel_for(tbb::blocked_range<int>(start, stop), task);
283+ }
284+}
285diff --git a/source/blender/editors/sculpt_paint/paint_mask.c b/source/blender/editors/sculpt_paint/paint_mask.c
286index a93e55685d2..d160fba4013 100644
287--- a/source/blender/editors/sculpt_paint/paint_mask.c
288+++ b/source/blender/editors/sculpt_paint/paint_mask.c
289@@ -166,9 +166,9 @@ static int mask_flood_fill_exec(bContext *C, wmOperator *op)
290 .value = value,
291 };
292
293- TaskParallelSettings settings;
294+ PBVHParallelSettings settings;
295 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
296- BLI_task_parallel_range(0, totnode, &data, mask_flood_fill_task_cb, &settings);
297+ BKE_pbvh_parallel_range(0, totnode, &data, mask_flood_fill_task_cb, &settings);
298
299 if (multires) {
300 multires_mark_as_modified(depsgraph, ob, MULTIRES_COORDS_MODIFIED);
301@@ -343,9 +343,9 @@ bool ED_sculpt_mask_box_select(struct bContext *C, ViewContext *vc, const rcti *
302 .clip_planes_final = clip_planes_final,
303 };
304
305- TaskParallelSettings settings;
306+ PBVHParallelSettings settings;
307 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
308- BLI_task_parallel_range(0, totnode, &data, mask_box_select_task_cb, &settings);
309+ BKE_pbvh_parallel_range(0, totnode, &data, mask_box_select_task_cb, &settings);
310
311 if (nodes) {
312 MEM_freeN(nodes);
313@@ -532,9 +532,9 @@ static int paint_mask_gesture_lasso_exec(bContext *C, wmOperator *op)
314 data.task_data.mode = mode;
315 data.task_data.value = value;
316
317- TaskParallelSettings settings;
318+ PBVHParallelSettings settings;
319 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
320- BLI_task_parallel_range(0, totnode, &data, mask_gesture_lasso_task_cb, &settings);
321+ BKE_pbvh_parallel_range(0, totnode, &data, mask_gesture_lasso_task_cb, &settings);
322
323 if (nodes) {
324 MEM_freeN(nodes);
325diff --git a/source/blender/editors/sculpt_paint/paint_vertex.c b/source/blender/editors/sculpt_paint/paint_vertex.c
326index 81e3f04a0d1..77c95c6acb3 100644
327--- a/source/blender/editors/sculpt_paint/paint_vertex.c
328+++ b/source/blender/editors/sculpt_paint/paint_vertex.c
329@@ -2072,9 +2072,9 @@ static void calculate_average_weight(SculptThreadedTaskData *data,
330 struct WPaintAverageAccum *accum = MEM_mallocN(sizeof(*accum) * totnode, __func__);
331 data->custom_data = accum;
332
333- TaskParallelSettings settings;
334+ PBVHParallelSettings settings;
335 BKE_pbvh_parallel_range_settings(&settings, (data->sd->flags & SCULPT_USE_OPENMP), totnode);
336- BLI_task_parallel_range(0, totnode, data, do_wpaint_brush_calc_average_weight_cb_ex, &settings);
337+ BKE_pbvh_parallel_range(0, totnode, data, do_wpaint_brush_calc_average_weight_cb_ex, &settings);
338
339 uint accum_len = 0;
340 double accum_weight = 0.0;
341@@ -2120,22 +2120,22 @@ static void wpaint_paint_leaves(bContext *C,
342 data.strength = BKE_brush_weight_get(scene, brush);
343
344 /* NOTE: current mirroring code cannot be run in parallel */
345- TaskParallelSettings settings;
346+ PBVHParallelSettings settings;
347 BKE_pbvh_parallel_range_settings(&settings, !(me->editflag & ME_EDIT_MIRROR_X), totnode);
348
349 switch ((eBrushWeightPaintTool)brush->weightpaint_tool) {
350 case WPAINT_TOOL_AVERAGE:
351 calculate_average_weight(&data, nodes, totnode);
352- BLI_task_parallel_range(0, totnode, &data, do_wpaint_brush_draw_task_cb_ex, &settings);
353+ BKE_pbvh_parallel_range(0, totnode, &data, do_wpaint_brush_draw_task_cb_ex, &settings);
354 break;
355 case WPAINT_TOOL_SMEAR:
356- BLI_task_parallel_range(0, totnode, &data, do_wpaint_brush_smear_task_cb_ex, &settings);
357+ BKE_pbvh_parallel_range(0, totnode, &data, do_wpaint_brush_smear_task_cb_ex, &settings);
358 break;
359 case WPAINT_TOOL_BLUR:
360- BLI_task_parallel_range(0, totnode, &data, do_wpaint_brush_blur_task_cb_ex, &settings);
361+ BKE_pbvh_parallel_range(0, totnode, &data, do_wpaint_brush_blur_task_cb_ex, &settings);
362 break;
363 case WPAINT_TOOL_DRAW:
364- BLI_task_parallel_range(0, totnode, &data, do_wpaint_brush_draw_task_cb_ex, &settings);
365+ BKE_pbvh_parallel_range(0, totnode, &data, do_wpaint_brush_draw_task_cb_ex, &settings);
366 break;
367 }
368 }
369@@ -3126,9 +3126,9 @@ static void calculate_average_color(SculptThreadedTaskData *data,
370 struct VPaintAverageAccum *accum = MEM_mallocN(sizeof(*accum) * totnode, __func__);
371 data->custom_data = accum;
372
373- TaskParallelSettings settings;
374+ PBVHParallelSettings settings;
375 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
376- BLI_task_parallel_range(0, totnode, data, do_vpaint_brush_calc_average_color_cb_ex, &settings);
377+ BKE_pbvh_parallel_range(0, totnode, data, do_vpaint_brush_calc_average_color_cb_ex, &settings);
378
379 uint accum_len = 0;
380 uint accum_value[3] = {0};
381@@ -3172,21 +3172,21 @@ static void vpaint_paint_leaves(bContext *C,
382 .lcol = (uint *)me->mloopcol,
383 .me = me,
384 };
385- TaskParallelSettings settings;
386+ PBVHParallelSettings settings;
387 BKE_pbvh_parallel_range_settings(&settings, true, totnode);
388 switch ((eBrushVertexPaintTool)brush->vertexpaint_tool) {
389 case VPAINT_TOOL_AVERAGE:
390 calculate_average_color(&data, nodes, totnode);
391- BLI_task_parallel_range(0, totnode, &data, do_vpaint_brush_draw_task_cb_ex, &settings);
392+ BKE_pbvh_parallel_range(0, totnode, &data, do_vpaint_brush_draw_task_cb_ex, &settings);
393 break;
394 case VPAINT_TOOL_BLUR:
395- BLI_task_parallel_range(0, totnode, &data, do_vpaint_brush_blur_task_cb_ex, &settings);
396+ BKE_pbvh_parallel_range(0, totnode, &data, do_vpaint_brush_blur_task_cb_ex, &settings);
397 break;
398 case VPAINT_TOOL_SMEAR:
399- BLI_task_parallel_range(0, totnode, &data, do_vpaint_brush_smear_task_cb_ex, &settings);
400+ BKE_pbvh_parallel_range(0, totnode, &data, do_vpaint_brush_smear_task_cb_ex, &settings);
401 break;
402 case VPAINT_TOOL_DRAW:
403- BLI_task_parallel_range(0, totnode, &data, do_vpaint_brush_draw_task_cb_ex, &settings);
404+ BKE_pbvh_parallel_range(0, totnode, &data, do_vpaint_brush_draw_task_cb_ex, &settings);
405 break;
406 }
407 }
408diff --git a/source/blender/editors/sculpt_paint/sculpt.c b/source/blender/editors/sculpt_paint/sculpt.c
409index 853324c7e17..320853bffeb 100644
410--- a/source/blender/editors/sculpt_paint/sculpt.c
411+++ b/source/blender/editors/sculpt_paint/sculpt.c
412@@ -402,17 +402,17 @@ static void do_nearest_vertex_get_task_cb(void *__restrict userdata,
413 BKE_pbvh_vertex_iter_end;
414 }
415
416-static void nearest_vertex_get_finalize(void *__restrict userdata, void *__restrict tls)
417+static void nearest_vertex_get_reduce(void *__restrict tls_join, void *__restrict tls)
418 {
419- SculptThreadedTaskData *data = userdata;
420+ NearestVertexTLSData *join = tls_join;
421 NearestVertexTLSData *nvtd = tls;
422- if (data->nearest_vertex_index == -1) {
423- data->nearest_vertex_index = nvtd->nearest_vertex_index;
424- data->nearest_vertex_distance_squared = nvtd->nearest_vertex_distance_squared;
425+ if (join->nearest_vertex_index == -1) {
426+ join->nearest_vertex_index = nvtd->nearest_vertex_index;
427+ join->nearest_vertex_distance_squared = nvtd->nearest_vertex_distance_squared;
428 }
429- else if (nvtd->nearest_vertex_distance_squared < data->nearest_vertex_distance_squared) {
430- data->nearest_vertex_index = nvtd->nearest_vertex_index;
431- data->nearest_vertex_distance_squared = nvtd->nearest_vertex_distance_squared;
432+ else if (nvtd->nearest_vertex_distance_squared < join->nearest_vertex_distance_squared) {
433+ join->nearest_vertex_index = nvtd->nearest_vertex_index;
434+ join->nearest_vertex_distance_squared = nvtd->nearest_vertex_distance_squared;
435 }
436 }
437
438@@ -439,25 +439,23 @@ static int sculpt_nearest_vertex_get(
439 .ob = ob,
440 .nodes = nodes,
441 .max_distance_squared = max_distance * max_distance,
442- .nearest_vertex_index = -1,
443 };
444
445 copy_v3_v3(task_data.nearest_vertex_search_co, co);
446- task_data.nearest_vertex_distance_squared = FLT_MAX;
447 NearestVertexTLSData nvtd;
448 nvtd.nearest_vertex_index = -1;
449 nvtd.nearest_vertex_distance_squared = FLT_MAX;
450
451- TaskParallelSettings settings;
452+ PBVHParallelSettings settings;
453 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
454- settings.func_finalize = nearest_vertex_get_finalize;
455+ settings.func_reduce = nearest_vertex_get_reduce;
456 settings.userdata_chunk = &nvtd;
457 settings.userdata_chunk_size = sizeof(NearestVertexTLSData);
458- BLI_task_parallel_range(0, totnode, &task_data, do_nearest_vertex_get_task_cb, &settings);
459+ BKE_pbvh_parallel_range(0, totnode, &task_data, do_nearest_vertex_get_task_cb, &settings);
460
461 MEM_SAFE_FREE(nodes);
462
463- return task_data.nearest_vertex_index;
464+ return nvtd.nearest_vertex_index;
465 }
466
467 static bool is_symmetry_iteration_valid(char i, char symm)
468@@ -934,9 +932,9 @@ static void paint_mesh_restore_co(Sculpt *sd, Object *ob)
469 .nodes = nodes,
470 };
471
472- TaskParallelSettings settings;
473+ PBVHParallelSettings settings;
474 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP) && !ss->bm, totnode);
475- BLI_task_parallel_range(0, totnode, &data, paint_mesh_restore_co_task_cb, &settings);
476+ BKE_pbvh_parallel_range(0, totnode, &data, paint_mesh_restore_co_task_cb, &settings);
477
478 MEM_SAFE_FREE(nodes);
479 }
480@@ -1417,9 +1415,10 @@ static float calc_symmetry_feather(Sculpt *sd, StrokeCache *cache)
481 * \{ */
482
483 typedef struct AreaNormalCenterTLSData {
484- float private_co[2][3];
485- float private_no[2][3];
486- int private_count[2];
487+ /* 0=towards view, 1=flipped */
488+ float area_cos[2][3];
489+ float area_nos[2][3];
490+ int area_count[2];
491 } AreaNormalCenterTLSData;
492
493 static void calc_area_normal_and_center_task_cb(void *__restrict userdata,
494@@ -1429,8 +1428,8 @@ static void calc_area_normal_and_center_task_cb(void *__restrict userdata,
495 SculptThreadedTaskData *data = userdata;
496 SculptSession *ss = data->ob->sculpt;
497 AreaNormalCenterTLSData *anctd = tls->userdata_chunk;
498- float(*area_nos)[3] = data->area_nos;
499- float(*area_cos)[3] = data->area_cos;
500+ const bool use_area_nos = data->use_area_nos;
501+ const bool use_area_cos = data->use_area_cos;
502
503 PBVHVertexIter vd;
504 SculptUndoNode *unode = NULL;
505@@ -1483,13 +1482,13 @@ static void calc_area_normal_and_center_task_cb(void *__restrict userdata,
506 normal_tri_v3(no, UNPACK3(co_tri));
507
508 flip_index = (dot_v3v3(ss->cache->view_normal, no) <= 0.0f);
509- if (area_cos) {
510- add_v3_v3(anctd->private_co[flip_index], co);
511+ if (use_area_cos) {
512+ add_v3_v3(anctd->area_cos[flip_index], co);
513 }
514- if (area_nos) {
515- add_v3_v3(anctd->private_no[flip_index], no);
516+ if (use_area_nos) {
517+ add_v3_v3(anctd->area_nos[flip_index], no);
518 }
519- anctd->private_count[flip_index] += 1;
520+ anctd->area_count[flip_index] += 1;
521 }
522 }
523 }
524@@ -1535,40 +1534,35 @@ static void calc_area_normal_and_center_task_cb(void *__restrict userdata,
525
526 flip_index = (dot_v3v3(ss->cache ? ss->cache->view_normal : ss->cursor_view_normal, no) <=
527 0.0f);
528- if (area_cos) {
529- add_v3_v3(anctd->private_co[flip_index], co);
530+ if (use_area_cos) {
531+ add_v3_v3(anctd->area_cos[flip_index], co);
532 }
533- if (area_nos) {
534- add_v3_v3(anctd->private_no[flip_index], no);
535+ if (use_area_nos) {
536+ add_v3_v3(anctd->area_nos[flip_index], no);
537 }
538- anctd->private_count[flip_index] += 1;
539+ anctd->area_count[flip_index] += 1;
540 }
541 }
542 BKE_pbvh_vertex_iter_end;
543 }
544 }
545
546-static void calc_area_normal_and_center_finalize(void *__restrict userdata, void *__restrict tls)
547+static void calc_area_normal_and_center_reduce(void *__restrict tls_join, void *__restrict tls)
548 {
549- SculptThreadedTaskData *data = userdata;
550+ AreaNormalCenterTLSData *join = tls_join;
551 AreaNormalCenterTLSData *anctd = tls;
552- float(*area_nos)[3] = data->area_nos;
553- float(*area_cos)[3] = data->area_cos;
554+
555 /* for flatten center */
556- if (area_cos) {
557- add_v3_v3(area_cos[0], anctd->private_co[0]);
558- add_v3_v3(area_cos[1], anctd->private_co[1]);
559- }
560+ add_v3_v3(join->area_cos[0], anctd->area_cos[0]);
561+ add_v3_v3(join->area_cos[1], anctd->area_cos[1]);
562
563 /* for area normal */
564- if (area_nos) {
565- add_v3_v3(area_nos[0], anctd->private_no[0]);
566- add_v3_v3(area_nos[1], anctd->private_no[1]);
567- }
568+ add_v3_v3(join->area_nos[0], anctd->area_nos[0]);
569+ add_v3_v3(join->area_nos[1], anctd->area_nos[1]);
570
571 /* weights */
572- data->count[0] += anctd->private_count[0];
573- data->count[1] += anctd->private_count[1];
574+ join->area_count[0] += anctd->area_count[0];
575+ join->area_count[1] += anctd->area_count[1];
576 }
577
578 static void calc_area_center(
579@@ -1579,11 +1573,6 @@ static void calc_area_center(
580 const bool has_bm_orco = ss->bm && sculpt_stroke_is_dynamic_topology(ss, brush);
581 int n;
582
583- /* 0=towards view, 1=flipped */
584- float area_cos[2][3] = {{0.0f}};
585-
586- int count[2] = {0};
587-
588 /* Intentionally set 'sd' to NULL since we share logic with vertex paint. */
589 SculptThreadedTaskData data = {
590 .sd = NULL,
591@@ -1592,24 +1581,22 @@ static void calc_area_center(
592 .nodes = nodes,
593 .totnode = totnode,
594 .has_bm_orco = has_bm_orco,
595- .area_cos = area_cos,
596- .area_nos = NULL,
597- .count = count,
598+ .use_area_cos = true,
599 };
600
601 AreaNormalCenterTLSData anctd = {{{0}}};
602
603- TaskParallelSettings settings;
604+ PBVHParallelSettings settings;
605 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
606- settings.func_finalize = calc_area_normal_and_center_finalize;
607+ settings.func_reduce = calc_area_normal_and_center_reduce;
608 settings.userdata_chunk = &anctd;
609 settings.userdata_chunk_size = sizeof(AreaNormalCenterTLSData);
610- BLI_task_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
611+ BKE_pbvh_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
612
613 /* for flatten center */
614- for (n = 0; n < ARRAY_SIZE(area_cos); n++) {
615- if (count[n] != 0) {
616- mul_v3_v3fl(r_area_co, area_cos[n], 1.0f / count[n]);
617+ for (n = 0; n < ARRAY_SIZE(anctd.area_cos); n++) {
618+ if (anctd.area_count[n] != 0) {
619+ mul_v3_v3fl(r_area_co, anctd.area_cos[n], 1.0f / anctd.area_count[n]);
620 break;
621 }
622 }
623@@ -1637,11 +1624,6 @@ bool sculpt_pbvh_calc_area_normal(const Brush *brush,
624 SculptSession *ss = ob->sculpt;
625 const bool has_bm_orco = ss->bm && sculpt_stroke_is_dynamic_topology(ss, brush);
626
627- /* 0=towards view, 1=flipped */
628- float area_nos[2][3] = {{0.0f}};
629-
630- int count[2] = {0};
631-
632 /* Intentionally set 'sd' to NULL since this is used for vertex paint too. */
633 SculptThreadedTaskData data = {
634 .sd = NULL,
635@@ -1650,24 +1632,22 @@ bool sculpt_pbvh_calc_area_normal(const Brush *brush,
636 .nodes = nodes,
637 .totnode = totnode,
638 .has_bm_orco = has_bm_orco,
639- .area_cos = NULL,
640- .area_nos = area_nos,
641- .count = count,
642+ .use_area_nos = true,
643 .any_vertex_sampled = false,
644 };
645
646 AreaNormalCenterTLSData anctd = {{{0}}};
647
648- TaskParallelSettings settings;
649+ PBVHParallelSettings settings;
650 BKE_pbvh_parallel_range_settings(&settings, use_threading, totnode);
651- settings.func_finalize = calc_area_normal_and_center_finalize;
652+ settings.func_reduce = calc_area_normal_and_center_reduce;
653 settings.userdata_chunk = &anctd;
654 settings.userdata_chunk_size = sizeof(AreaNormalCenterTLSData);
655- BLI_task_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
656+ BKE_pbvh_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
657
658 /* for area normal */
659- for (int i = 0; i < ARRAY_SIZE(area_nos); i++) {
660- if (normalize_v3_v3(r_area_no, area_nos[i]) != 0.0f) {
661+ for (int i = 0; i < ARRAY_SIZE(anctd.area_nos); i++) {
662+ if (normalize_v3_v3(r_area_no, anctd.area_nos[i]) != 0.0f) {
663 break;
664 }
665 }
666@@ -1685,12 +1665,6 @@ static void calc_area_normal_and_center(
667 const bool has_bm_orco = ss->bm && sculpt_stroke_is_dynamic_topology(ss, brush);
668 int n;
669
670- /* 0=towards view, 1=flipped */
671- float area_cos[2][3] = {{0.0f}};
672- float area_nos[2][3] = {{0.0f}};
673-
674- int count[2] = {0};
675-
676 /* Intentionally set 'sd' to NULL since this is used for vertex paint too. */
677 SculptThreadedTaskData data = {
678 .sd = NULL,
679@@ -1699,24 +1673,23 @@ static void calc_area_normal_and_center(
680 .nodes = nodes,
681 .totnode = totnode,
682 .has_bm_orco = has_bm_orco,
683- .area_cos = area_cos,
684- .area_nos = area_nos,
685- .count = count,
686+ .use_area_cos = true,
687+ .use_area_nos = true,
688 };
689
690 AreaNormalCenterTLSData anctd = {{{0}}};
691
692- TaskParallelSettings settings;
693+ PBVHParallelSettings settings;
694 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
695- settings.func_finalize = calc_area_normal_and_center_finalize;
696+ settings.func_reduce = calc_area_normal_and_center_reduce;
697 settings.userdata_chunk = &anctd;
698 settings.userdata_chunk_size = sizeof(AreaNormalCenterTLSData);
699- BLI_task_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
700+ BKE_pbvh_parallel_range(0, totnode, &data, calc_area_normal_and_center_task_cb, &settings);
701
702 /* for flatten center */
703- for (n = 0; n < ARRAY_SIZE(area_cos); n++) {
704- if (count[n] != 0) {
705- mul_v3_v3fl(r_area_co, area_cos[n], 1.0f / count[n]);
706+ for (n = 0; n < ARRAY_SIZE(anctd.area_cos); n++) {
707+ if (anctd.area_count[n] != 0) {
708+ mul_v3_v3fl(r_area_co, anctd.area_cos[n], 1.0f / anctd.area_count[n]);
709 break;
710 }
711 }
712@@ -1725,8 +1698,8 @@ static void calc_area_normal_and_center(
713 }
714
715 /* for area normal */
716- for (n = 0; n < ARRAY_SIZE(area_nos); n++) {
717- if (normalize_v3_v3(r_area_no, area_nos[n]) != 0.0f) {
718+ for (n = 0; n < ARRAY_SIZE(anctd.area_nos); n++) {
719+ if (normalize_v3_v3(r_area_no, anctd.area_nos[n]) != 0.0f) {
720 break;
721 }
722 }
723@@ -2825,7 +2798,7 @@ static void smooth(Sculpt *sd,
724 .strength = strength,
725 };
726
727- TaskParallelSettings settings;
728+ PBVHParallelSettings settings;
729 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
730
731 switch (type) {
732@@ -2843,16 +2816,16 @@ static void smooth(Sculpt *sd,
733
734 settings.userdata_chunk = data_chunk;
735 settings.userdata_chunk_size = size;
736- BLI_task_parallel_range(0, totnode, &data, do_smooth_brush_multires_task_cb_ex, &settings);
737+ BKE_pbvh_parallel_range(0, totnode, &data, do_smooth_brush_multires_task_cb_ex, &settings);
738
739 MEM_freeN(data_chunk);
740 break;
741 }
742 case PBVH_FACES:
743- BLI_task_parallel_range(0, totnode, &data, do_smooth_brush_mesh_task_cb_ex, &settings);
744+ BKE_pbvh_parallel_range(0, totnode, &data, do_smooth_brush_mesh_task_cb_ex, &settings);
745 break;
746 case PBVH_BMESH:
747- BLI_task_parallel_range(0, totnode, &data, do_smooth_brush_bmesh_task_cb_ex, &settings);
748+ BKE_pbvh_parallel_range(0, totnode, &data, do_smooth_brush_bmesh_task_cb_ex, &settings);
749 break;
750 }
751
752@@ -2884,10 +2857,10 @@ static void bmesh_topology_rake(
753 .nodes = nodes,
754 .strength = factor,
755 };
756- TaskParallelSettings settings;
757+ PBVHParallelSettings settings;
758 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
759
760- BLI_task_parallel_range(0, totnode, &data, do_topology_rake_bmesh_task_cb_ex, &settings);
761+ BKE_pbvh_parallel_range(0, totnode, &data, do_topology_rake_bmesh_task_cb_ex, &settings);
762 }
763 }
764
765@@ -2941,9 +2914,9 @@ static void do_mask_brush_draw(Sculpt *sd, Object *ob, PBVHNode **nodes, int tot
766 .nodes = nodes,
767 };
768
769- TaskParallelSettings settings;
770+ PBVHParallelSettings settings;
771 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
772- BLI_task_parallel_range(0, totnode, &data, do_mask_brush_draw_task_cb_ex, &settings);
773+ BKE_pbvh_parallel_range(0, totnode, &data, do_mask_brush_draw_task_cb_ex, &settings);
774 }
775
776 static void do_mask_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
777@@ -3028,9 +3001,9 @@ static void do_draw_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
778 .offset = offset,
779 };
780
781- TaskParallelSettings settings;
782+ PBVHParallelSettings settings;
783 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
784- BLI_task_parallel_range(0, totnode, &data, do_draw_brush_task_cb_ex, &settings);
785+ BKE_pbvh_parallel_range(0, totnode, &data, do_draw_brush_task_cb_ex, &settings);
786 }
787
788 static void do_draw_sharp_brush_task_cb_ex(void *__restrict userdata,
789@@ -3104,9 +3077,9 @@ static void do_draw_sharp_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int to
790 .offset = offset,
791 };
792
793- TaskParallelSettings settings;
794+ PBVHParallelSettings settings;
795 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
796- BLI_task_parallel_range(0, totnode, &data, do_draw_sharp_brush_task_cb_ex, &settings);
797+ BKE_pbvh_parallel_range(0, totnode, &data, do_draw_sharp_brush_task_cb_ex, &settings);
798 }
799
800 /**
801@@ -3220,9 +3193,9 @@ static void do_crease_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnod
802 .flippedbstrength = flippedbstrength,
803 };
804
805- TaskParallelSettings settings;
806+ PBVHParallelSettings settings;
807 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
808- BLI_task_parallel_range(0, totnode, &data, do_crease_brush_task_cb_ex, &settings);
809+ BKE_pbvh_parallel_range(0, totnode, &data, do_crease_brush_task_cb_ex, &settings);
810 }
811
812 static void do_pinch_brush_task_cb_ex(void *__restrict userdata,
813@@ -3282,9 +3255,9 @@ static void do_pinch_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode
814 .nodes = nodes,
815 };
816
817- TaskParallelSettings settings;
818+ PBVHParallelSettings settings;
819 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
820- BLI_task_parallel_range(0, totnode, &data, do_pinch_brush_task_cb_ex, &settings);
821+ BKE_pbvh_parallel_range(0, totnode, &data, do_pinch_brush_task_cb_ex, &settings);
822 }
823
824 static void do_grab_brush_task_cb_ex(void *__restrict userdata,
825@@ -3354,9 +3327,9 @@ static void do_grab_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
826 .grab_delta = grab_delta,
827 };
828
829- TaskParallelSettings settings;
830+ PBVHParallelSettings settings;
831 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
832- BLI_task_parallel_range(0, totnode, &data, do_grab_brush_task_cb_ex, &settings);
833+ BKE_pbvh_parallel_range(0, totnode, &data, do_grab_brush_task_cb_ex, &settings);
834 }
835
836 /* Regularized Kelvinlets: Sculpting Brushes based on Fundamental Solutions of Elasticity
837@@ -3603,9 +3576,9 @@ static void do_elastic_deform_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, in
838 .grab_delta = grab_delta,
839 };
840
841- TaskParallelSettings settings;
842+ PBVHParallelSettings settings;
843 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
844- BLI_task_parallel_range(0, totnode, &data, do_elastic_deform_brush_task_cb_ex, &settings);
845+ BKE_pbvh_parallel_range(0, totnode, &data, do_elastic_deform_brush_task_cb_ex, &settings);
846 }
847
848 static void do_pose_brush_task_cb_ex(void *__restrict userdata,
849@@ -3696,14 +3669,14 @@ static void do_pose_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
850 .transform_trans_inv = transform_trans_inv,
851 };
852
853- TaskParallelSettings settings;
854+ PBVHParallelSettings settings;
855 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
856- BLI_task_parallel_range(0, totnode, &data, do_pose_brush_task_cb_ex, &settings);
857+ BKE_pbvh_parallel_range(0, totnode, &data, do_pose_brush_task_cb_ex, &settings);
858 }
859
860 typedef struct PoseGrowFactorTLSData {
861 float pos_avg[3];
862- int tot_pos_avg;
863+ int pos_count;
864 } PoseGrowFactorTLSData;
865
866 static void pose_brush_grow_factor_task_cb_ex(void *__restrict userdata,
867@@ -3732,7 +3705,7 @@ static void pose_brush_grow_factor_task_cb_ex(void *__restrict userdata,
868 data->pose_factor[vd.index] = max;
869 if (check_vertex_pivot_symmetry(vd.co, active_co, symm)) {
870 add_v3_v3(gftd->pos_avg, vd.co);
871- gftd->tot_pos_avg++;
872+ gftd->pos_count++;
873 }
874 }
875 }
876@@ -3740,12 +3713,12 @@ static void pose_brush_grow_factor_task_cb_ex(void *__restrict userdata,
877 BKE_pbvh_vertex_iter_end;
878 }
879
880-static void pose_brush_grow_factor_finalize(void *__restrict userdata, void *__restrict tls)
881+static void pose_brush_grow_factor_reduce(void *__restrict tls_join, void *__restrict tls)
882 {
883- SculptThreadedTaskData *data = userdata;
884+ PoseGrowFactorTLSData *join = tls_join;
885 PoseGrowFactorTLSData *gftd = tls;
886- add_v3_v3(data->tot_pos_avg, gftd->pos_avg);
887- data->tot_pos_count += gftd->tot_pos_avg;
888+ add_v3_v3(join->pos_avg, gftd->pos_avg);
889+ join->pos_count += gftd->pos_count;
890 }
891
892 /* Grow the factor until its boundary is near to the offset pose origin */
893@@ -3764,12 +3737,12 @@ static void sculpt_pose_grow_pose_factor(
894 .totnode = totnode,
895 .pose_factor = pose_factor,
896 };
897- TaskParallelSettings settings;
898+ PBVHParallelSettings settings;
899 PoseGrowFactorTLSData gftd;
900- gftd.tot_pos_avg = 0;
901+ gftd.pos_count = 0;
902 zero_v3(gftd.pos_avg);
903 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
904- settings.func_finalize = pose_brush_grow_factor_finalize;
905+ settings.func_reduce = pose_brush_grow_factor_reduce;
906 settings.userdata_chunk = &gftd;
907 settings.userdata_chunk_size = sizeof(PoseGrowFactorTLSData);
908
909@@ -3777,15 +3750,13 @@ static void sculpt_pose_grow_pose_factor(
910 float prev_len = FLT_MAX;
911 data.prev_mask = MEM_mallocN(sculpt_vertex_count_get(ss) * sizeof(float), "prev mask");
912 while (grow_next_iteration) {
913- zero_v3(data.tot_pos_avg);
914- data.tot_pos_count = 0;
915 zero_v3(gftd.pos_avg);
916- gftd.tot_pos_avg = 0;
917+ gftd.pos_count = 0;
918 memcpy(data.prev_mask, pose_factor, sculpt_vertex_count_get(ss) * sizeof(float));
919- BLI_task_parallel_range(0, totnode, &data, pose_brush_grow_factor_task_cb_ex, &settings);
920- if (data.tot_pos_count != 0) {
921- mul_v3_fl(data.tot_pos_avg, 1.0f / (float)data.tot_pos_count);
922- float len = len_v3v3(data.tot_pos_avg, pose_origin);
923+ BKE_pbvh_parallel_range(0, totnode, &data, pose_brush_grow_factor_task_cb_ex, &settings);
924+ if (gftd.pos_count != 0) {
925+ mul_v3_fl(gftd.pos_avg, 1.0f / (float)gftd.pos_count);
926+ float len = len_v3v3(gftd.pos_avg, pose_origin);
927 if (len < prev_len) {
928 prev_len = len;
929 grow_next_iteration = true;
930@@ -3954,9 +3925,9 @@ static void sculpt_pose_brush_init(
931
932 /* Smooth the pose brush factor for cleaner deformation */
933 for (int i = 0; i < 4; i++) {
934- TaskParallelSettings settings;
935+ PBVHParallelSettings settings;
936 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
937- BLI_task_parallel_range(0, totnode, &data, pose_brush_init_task_cb_ex, &settings);
938+ BKE_pbvh_parallel_range(0, totnode, &data, pose_brush_init_task_cb_ex, &settings);
939 }
940
941 MEM_SAFE_FREE(nodes);
942@@ -4024,9 +3995,9 @@ static void do_nudge_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode
943 .cono = cono,
944 };
945
946- TaskParallelSettings settings;
947+ PBVHParallelSettings settings;
948 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
949- BLI_task_parallel_range(0, totnode, &data, do_nudge_brush_task_cb_ex, &settings);
950+ BKE_pbvh_parallel_range(0, totnode, &data, do_nudge_brush_task_cb_ex, &settings);
951 }
952
953 static void do_snake_hook_brush_task_cb_ex(void *__restrict userdata,
954@@ -4145,9 +4116,9 @@ static void do_snake_hook_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int to
955 .grab_delta = grab_delta,
956 };
957
958- TaskParallelSettings settings;
959+ PBVHParallelSettings settings;
960 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
961- BLI_task_parallel_range(0, totnode, &data, do_snake_hook_brush_task_cb_ex, &settings);
962+ BKE_pbvh_parallel_range(0, totnode, &data, do_snake_hook_brush_task_cb_ex, &settings);
963 }
964
965 static void do_thumb_brush_task_cb_ex(void *__restrict userdata,
966@@ -4217,9 +4188,9 @@ static void do_thumb_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode
967 .cono = cono,
968 };
969
970- TaskParallelSettings settings;
971+ PBVHParallelSettings settings;
972 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
973- BLI_task_parallel_range(0, totnode, &data, do_thumb_brush_task_cb_ex, &settings);
974+ BKE_pbvh_parallel_range(0, totnode, &data, do_thumb_brush_task_cb_ex, &settings);
975 }
976
977 static void do_rotate_brush_task_cb_ex(void *__restrict userdata,
978@@ -4290,9 +4261,9 @@ static void do_rotate_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnod
979 .angle = angle,
980 };
981
982- TaskParallelSettings settings;
983+ PBVHParallelSettings settings;
984 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
985- BLI_task_parallel_range(0, totnode, &data, do_rotate_brush_task_cb_ex, &settings);
986+ BKE_pbvh_parallel_range(0, totnode, &data, do_rotate_brush_task_cb_ex, &settings);
987 }
988
989 static void do_layer_brush_task_cb_ex(void *__restrict userdata,
990@@ -4387,9 +4358,9 @@ static void do_layer_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode
991 };
992 BLI_mutex_init(&data.mutex);
993
994- TaskParallelSettings settings;
995+ PBVHParallelSettings settings;
996 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
997- BLI_task_parallel_range(0, totnode, &data, do_layer_brush_task_cb_ex, &settings);
998+ BKE_pbvh_parallel_range(0, totnode, &data, do_layer_brush_task_cb_ex, &settings);
999
1000 BLI_mutex_end(&data.mutex);
1001 }
1002@@ -4455,9 +4426,9 @@ static void do_inflate_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totno
1003 .nodes = nodes,
1004 };
1005
1006- TaskParallelSettings settings;
1007+ PBVHParallelSettings settings;
1008 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1009- BLI_task_parallel_range(0, totnode, &data, do_inflate_brush_task_cb_ex, &settings);
1010+ BKE_pbvh_parallel_range(0, totnode, &data, do_inflate_brush_task_cb_ex, &settings);
1011 }
1012
1013 static void calc_sculpt_plane(
1014@@ -4664,9 +4635,9 @@ static void do_flatten_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totno
1015 .area_co = area_co,
1016 };
1017
1018- TaskParallelSettings settings;
1019+ PBVHParallelSettings settings;
1020 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1021- BLI_task_parallel_range(0, totnode, &data, do_flatten_brush_task_cb_ex, &settings);
1022+ BKE_pbvh_parallel_range(0, totnode, &data, do_flatten_brush_task_cb_ex, &settings);
1023 }
1024
1025 static void do_clay_brush_task_cb_ex(void *__restrict userdata,
1026@@ -4762,9 +4733,9 @@ static void do_clay_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
1027 .area_co = area_co,
1028 };
1029
1030- TaskParallelSettings settings;
1031+ PBVHParallelSettings settings;
1032 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1033- BLI_task_parallel_range(0, totnode, &data, do_clay_brush_task_cb_ex, &settings);
1034+ BKE_pbvh_parallel_range(0, totnode, &data, do_clay_brush_task_cb_ex, &settings);
1035 }
1036
1037 static void do_clay_strips_brush_task_cb_ex(void *__restrict userdata,
1038@@ -4892,9 +4863,9 @@ static void do_clay_strips_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int t
1039 .mat = mat,
1040 };
1041
1042- TaskParallelSettings settings;
1043+ PBVHParallelSettings settings;
1044 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1045- BLI_task_parallel_range(0, totnode, &data, do_clay_strips_brush_task_cb_ex, &settings);
1046+ BKE_pbvh_parallel_range(0, totnode, &data, do_clay_strips_brush_task_cb_ex, &settings);
1047 }
1048
1049 static void do_fill_brush_task_cb_ex(void *__restrict userdata,
1050@@ -4985,9 +4956,9 @@ static void do_fill_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode)
1051 .area_co = area_co,
1052 };
1053
1054- TaskParallelSettings settings;
1055+ PBVHParallelSettings settings;
1056 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1057- BLI_task_parallel_range(0, totnode, &data, do_fill_brush_task_cb_ex, &settings);
1058+ BKE_pbvh_parallel_range(0, totnode, &data, do_fill_brush_task_cb_ex, &settings);
1059 }
1060
1061 static void do_scrape_brush_task_cb_ex(void *__restrict userdata,
1062@@ -5077,9 +5048,9 @@ static void do_scrape_brush(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnod
1063 .area_co = area_co,
1064 };
1065
1066- TaskParallelSettings settings;
1067+ PBVHParallelSettings settings;
1068 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1069- BLI_task_parallel_range(0, totnode, &data, do_scrape_brush_task_cb_ex, &settings);
1070+ BKE_pbvh_parallel_range(0, totnode, &data, do_scrape_brush_task_cb_ex, &settings);
1071 }
1072
1073 static void do_gravity_task_cb_ex(void *__restrict userdata,
1074@@ -5146,9 +5117,9 @@ static void do_gravity(Sculpt *sd, Object *ob, PBVHNode **nodes, int totnode, fl
1075 .offset = offset,
1076 };
1077
1078- TaskParallelSettings settings;
1079+ PBVHParallelSettings settings;
1080 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1081- BLI_task_parallel_range(0, totnode, &data, do_gravity_task_cb_ex, &settings);
1082+ BKE_pbvh_parallel_range(0, totnode, &data, do_gravity_task_cb_ex, &settings);
1083 }
1084
1085 void sculpt_vertcos_to_key(Object *ob, KeyBlock *kb, const float (*vertCos)[3])
1086@@ -5315,9 +5286,9 @@ static void do_brush_action(Sculpt *sd, Object *ob, Brush *brush, UnifiedPaintSe
1087 .nodes = nodes,
1088 };
1089
1090- TaskParallelSettings settings;
1091+ PBVHParallelSettings settings;
1092 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1093- BLI_task_parallel_range(0, totnode, &task_data, do_brush_action_task_cb, &settings);
1094+ BKE_pbvh_parallel_range(0, totnode, &task_data, do_brush_action_task_cb, &settings);
1095
1096 if (sculpt_brush_needs_normal(ss, brush)) {
1097 update_sculpt_normal(sd, ob, nodes, totnode);
1098@@ -5537,9 +5508,9 @@ static void sculpt_combine_proxies(Sculpt *sd, Object *ob)
1099 .nodes = nodes,
1100 };
1101
1102- TaskParallelSettings settings;
1103+ PBVHParallelSettings settings;
1104 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1105- BLI_task_parallel_range(0, totnode, &data, sculpt_combine_proxies_task_cb, &settings);
1106+ BKE_pbvh_parallel_range(0, totnode, &data, sculpt_combine_proxies_task_cb, &settings);
1107 }
1108
1109 MEM_SAFE_FREE(nodes);
1110@@ -5627,9 +5598,9 @@ static void sculpt_flush_stroke_deform(Sculpt *sd, Object *ob, bool is_proxy_use
1111 .vertCos = vertCos,
1112 };
1113
1114- TaskParallelSettings settings;
1115+ PBVHParallelSettings settings;
1116 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1117- BLI_task_parallel_range(0, totnode, &data, sculpt_flush_stroke_deform_task_cb, &settings);
1118+ BKE_pbvh_parallel_range(0, totnode, &data, sculpt_flush_stroke_deform_task_cb, &settings);
1119
1120 if (vertCos) {
1121 sculpt_vertcos_to_key(ob, ss->kb, vertCos);
1122@@ -8219,10 +8190,10 @@ static void sculpt_filter_cache_init(Object *ob, Sculpt *sd)
1123 .nodes = ss->filter_cache->nodes,
1124 };
1125
1126- TaskParallelSettings settings;
1127+ PBVHParallelSettings settings;
1128 BKE_pbvh_parallel_range_settings(
1129 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1130- BLI_task_parallel_range(
1131+ BKE_pbvh_parallel_range(
1132 0, ss->filter_cache->totnode, &data, filter_cache_init_task_cb, &settings);
1133 }
1134
1135@@ -8418,10 +8389,10 @@ static int sculpt_mesh_filter_modal(bContext *C, wmOperator *op, const wmEvent *
1136 .filter_strength = filter_strength,
1137 };
1138
1139- TaskParallelSettings settings;
1140+ PBVHParallelSettings settings;
1141 BKE_pbvh_parallel_range_settings(
1142 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1143- BLI_task_parallel_range(0, ss->filter_cache->totnode, &data, mesh_filter_task_cb, &settings);
1144+ BKE_pbvh_parallel_range(0, ss->filter_cache->totnode, &data, mesh_filter_task_cb, &settings);
1145
1146 if (ss->modifiers_active || ss->kb) {
1147 sculpt_flush_stroke_deform(sd, ob, true);
1148@@ -8695,9 +8666,9 @@ static int sculpt_mask_filter_exec(bContext *C, wmOperator *op)
1149 .prev_mask = prev_mask,
1150 };
1151
1152- TaskParallelSettings settings;
1153+ PBVHParallelSettings settings;
1154 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1155- BLI_task_parallel_range(0, totnode, &data, mask_filter_task_cb, &settings);
1156+ BKE_pbvh_parallel_range(0, totnode, &data, mask_filter_task_cb, &settings);
1157
1158 if (ELEM(filter_type, MASK_FILTER_GROW, MASK_FILTER_SHRINK)) {
1159 MEM_freeN(prev_mask);
1160@@ -8806,13 +8777,12 @@ static void dirty_mask_compute_range_task_cb(void *__restrict userdata,
1161 BKE_pbvh_vertex_iter_end;
1162 }
1163
1164-static void dirty_mask_compute_range_finalize(void *__restrict userdata, void *__restrict tls)
1165+static void dirty_mask_compute_range_reduce(void *__restrict tls_join, void *__restrict tls)
1166 {
1167- SculptThreadedTaskData *data = userdata;
1168+ DirtyMaskRangeData *join = tls_join;
1169 DirtyMaskRangeData *range = tls;
1170-
1171- data->dirty_mask_min = min_ff(range->min, data->dirty_mask_min);
1172- data->dirty_mask_max = max_ff(range->max, data->dirty_mask_max);
1173+ join->min = min_ff(range->min, join->min);
1174+ join->max = max_ff(range->max, join->max);
1175 }
1176
1177 static void dirty_mask_apply_task_cb(void *__restrict userdata,
1178@@ -8883,8 +8853,6 @@ static int sculpt_dirty_mask_exec(bContext *C, wmOperator *op)
1179 .sd = sd,
1180 .ob = ob,
1181 .nodes = nodes,
1182- .dirty_mask_min = FLT_MAX,
1183- .dirty_mask_max = -FLT_MAX,
1184 .dirty_mask_dirty_only = RNA_boolean_get(op->ptr, "dirty_only"),
1185 };
1186 DirtyMaskRangeData range = {
1187@@ -8892,15 +8860,17 @@ static int sculpt_dirty_mask_exec(bContext *C, wmOperator *op)
1188 .max = -FLT_MAX,
1189 };
1190
1191- TaskParallelSettings settings;
1192+ PBVHParallelSettings settings;
1193 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1194
1195- settings.func_finalize = dirty_mask_compute_range_finalize;
1196+ settings.func_reduce = dirty_mask_compute_range_reduce;
1197 settings.userdata_chunk = &range;
1198 settings.userdata_chunk_size = sizeof(DirtyMaskRangeData);
1199
1200- BLI_task_parallel_range(0, totnode, &data, dirty_mask_compute_range_task_cb, &settings);
1201- BLI_task_parallel_range(0, totnode, &data, dirty_mask_apply_task_cb, &settings);
1202+ BKE_pbvh_parallel_range(0, totnode, &data, dirty_mask_compute_range_task_cb, &settings);
1203+ data.dirty_mask_min = range.min;
1204+ data.dirty_mask_max = range.max;
1205+ BKE_pbvh_parallel_range(0, totnode, &data, dirty_mask_apply_task_cb, &settings);
1206
1207 MEM_SAFE_FREE(nodes);
1208
1209@@ -9059,10 +9029,10 @@ static int sculpt_mask_expand_modal(bContext *C, wmOperator *op, const wmEvent *
1210 int smooth_iterations = RNA_int_get(op->ptr, "smooth_iterations");
1211 BKE_sculpt_update_object_for_edit(depsgraph, ob, true, false);
1212 for (int i = 0; i < smooth_iterations; i++) {
1213- TaskParallelSettings settings;
1214+ PBVHParallelSettings settings;
1215 BKE_pbvh_parallel_range_settings(
1216 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1217- BLI_task_parallel_range(0, ss->filter_cache->totnode, &data, mask_filter_task_cb, &settings);
1218+ BKE_pbvh_parallel_range(0, ss->filter_cache->totnode, &data, mask_filter_task_cb, &settings);
1219 }
1220
1221 /* Pivot position */
1222@@ -9128,10 +9098,10 @@ static int sculpt_mask_expand_modal(bContext *C, wmOperator *op, const wmEvent *
1223 .mask_expand_invert_mask = RNA_boolean_get(op->ptr, "invert"),
1224 .mask_expand_keep_prev_mask = RNA_boolean_get(op->ptr, "keep_previous_mask"),
1225 };
1226- TaskParallelSettings settings;
1227+ PBVHParallelSettings settings;
1228 BKE_pbvh_parallel_range_settings(
1229 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1230- BLI_task_parallel_range(0, ss->filter_cache->totnode, &data, sculpt_expand_task_cb, &settings);
1231+ BKE_pbvh_parallel_range(0, ss->filter_cache->totnode, &data, sculpt_expand_task_cb, &settings);
1232 ss->filter_cache->mask_update_current_it = mask_expand_update_it;
1233 }
1234
1235@@ -9266,10 +9236,10 @@ static int sculpt_mask_expand_invoke(bContext *C, wmOperator *op, const wmEvent
1236 .mask_expand_invert_mask = RNA_boolean_get(op->ptr, "invert"),
1237 .mask_expand_keep_prev_mask = RNA_boolean_get(op->ptr, "keep_previous_mask"),
1238 };
1239- TaskParallelSettings settings;
1240+ PBVHParallelSettings settings;
1241 BKE_pbvh_parallel_range_settings(
1242 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1243- BLI_task_parallel_range(0, ss->filter_cache->totnode, &data, sculpt_expand_task_cb, &settings);
1244+ BKE_pbvh_parallel_range(0, ss->filter_cache->totnode, &data, sculpt_expand_task_cb, &settings);
1245
1246 const char *status_str = TIP_(
1247 "Move the mouse to expand the mask from the active vertex. LBM: confirm mask, ESC/RMB: "
1248@@ -9583,10 +9553,10 @@ void ED_sculpt_update_modal_transform(struct bContext *C)
1249 mul_m4_m4m4(data.transform_mats[i], pivot_mat, data.transform_mats[i]);
1250 }
1251
1252- TaskParallelSettings settings;
1253+ PBVHParallelSettings settings;
1254 BKE_pbvh_parallel_range_settings(
1255 &settings, (sd->flags & SCULPT_USE_OPENMP), ss->filter_cache->totnode);
1256- BLI_task_parallel_range(
1257+ BKE_pbvh_parallel_range(
1258 0, ss->filter_cache->totnode, &data, sculpt_transform_task_cb, &settings);
1259
1260 if (ss->modifiers_active || ss->kb) {
1261diff --git a/source/blender/editors/sculpt_paint/sculpt_intern.h b/source/blender/editors/sculpt_paint/sculpt_intern.h
1262index cbc13e5f0d2..be79ebbd6e5 100644
1263--- a/source/blender/editors/sculpt_paint/sculpt_intern.h
1264+++ b/source/blender/editors/sculpt_paint/sculpt_intern.h
1265@@ -195,10 +195,8 @@ typedef struct SculptThreadedTaskData {
1266 int filter_type;
1267 float filter_strength;
1268
1269- /* 0=towards view, 1=flipped */
1270- float (*area_cos)[3];
1271- float (*area_nos)[3];
1272- int *count;
1273+ bool use_area_cos;
1274+ bool use_area_nos;
1275 bool any_vertex_sampled;
1276
1277 float *prev_mask;
1278@@ -208,13 +206,8 @@ typedef struct SculptThreadedTaskData {
1279 float *pose_factor;
1280 float (*transform_rot)[4], (*transform_trans)[4], (*transform_trans_inv)[4];
1281
1282- float tot_pos_avg[3];
1283- int tot_pos_count;
1284-
1285 float max_distance_squared;
1286 float nearest_vertex_search_co[3];
1287- int nearest_vertex_index;
1288- float nearest_vertex_distance_squared;
1289
1290 int mask_expand_update_it;
1291 bool mask_expand_invert_mask;
1292diff --git a/source/blender/editors/sculpt_paint/sculpt_undo.c b/source/blender/editors/sculpt_paint/sculpt_undo.c
1293index 74e5444bc3e..3e45e82ee66 100644
1294--- a/source/blender/editors/sculpt_paint/sculpt_undo.c
1295+++ b/source/blender/editors/sculpt_paint/sculpt_undo.c
1296@@ -343,9 +343,9 @@ static void sculpt_undo_bmesh_restore_generic(bContext *C,
1297
1298 BKE_pbvh_search_gather(ss->pbvh, NULL, NULL, &nodes, &totnode);
1299
1300- TaskParallelSettings settings;
1301+ PBVHParallelSettings settings;
1302 BKE_pbvh_parallel_range_settings(&settings, (sd->flags & SCULPT_USE_OPENMP), totnode);
1303- BLI_task_parallel_range(
1304+ BKE_pbvh_parallel_range(
1305 0, totnode, nodes, sculpt_undo_bmesh_restore_generic_task_cb, &settings);
1306
1307 if (nodes) {

Another way to reduce threading overhead would be to somehow apply multiple steps with one task callback per node. But that's a rather drastic change and is not possible for all brushes probably.

TBB seems to be much better.
Our current scheduler/TBB/Multithread disabled

Note that the Threaded Sculpt setting doesn't do anything with the TBB test patch, it always uses threads.

I haven't checked how many threads TBB is actually using, or if there are different TBB parameters that work better.

The main downside to this is that it means another thread pool is in memory next to our own, which isn't ideal for memory usage. But when using OpenVDB it will be there anyway since it uses TBB.

Perhaps ideally we would replace our task scheduler entirely by the one from TBB, but that's more work than we can do for 2.81.

If this bug is affecting other platforms I think we should change the PBVH scheduler to TBB or disable multithreading by default in 2.81. I don't have any Windows or Mac computer to test the performance of this patch in those platforms.

I don't have that slowdown on windows 10
Here are my specs:
System Information
Operating system: Windows-10-10.0.17134 64 Bits
Graphics card: GeForce GTX 770/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 399.24
intel 4770k
Blender Version
2.81 (sub 13), branch: master, commit date: 2019-10-04 22:33, hash: rBab519b91b2c4

Please test if reverting rBc931a00 avoids the performance issue. That would be the safest thing to do for 2.81.

Then we can do TBB for 2.82.

Reverting that commit is faster on my setup, but disabling multithread is always faster. Most users reported that the new settings are faster, so if we can't properly test that TBB in all setups for 2.81 we should probably leave this as it is.