BLI: avoid invoking tbb for small workloads

We often call `parallel_for` in places with very variable sized workloads. When many elements are processed, using multi-threading is great, but when processing few elements (possibly many times) using `parallel_for` can result in significant overhead. I measured that this improves performance by >20% in the refactored realize instances code I'm working on separately. The change might also help with debugging sometimes, because the stack trace is smaller and contains fewer irrevelant symbols.
Referenced by commit c2737913db, BLI: Avoid invoking tbb for small parallel_reduce calls Referenced by commit e9334c5df8, Geometry Nodes: Avoid parallel_for_each with a single geometry
2021-12-02 12:54:35 +01:00 · 2021-12-02 12:54:35 +01:00 · e130903060 · 2023-10-04 09:42:55 +02:00
parent 0f89d05848
commit e130903060
1 changed files with 10 additions and 5 deletions
--- a/source/blender/blenlib/BLI_task.hh
+++ b/source/blender/blenlib/BLI_task.hh
@ -67,14 +67,19 @@ void parallel_for(IndexRange range, int64_t grain_size, const Function &function
    return;
  }
 #ifdef WITH_TBB
-  tbb::parallel_for(tbb::blocked_range<int64_t>(range.first(), range.one_after_last(), grain_size),
-                    [&](const tbb::blocked_range<int64_t> &subrange) {
-                      function(IndexRange(subrange.begin(), subrange.size()));
-                    });
+  /* Invoking tbb for small workloads has a large overhead. */
+  if (range.size() >= grain_size) {
+    tbb::parallel_for(
+        tbb::blocked_range<int64_t>(range.first(), range.one_after_last(), grain_size),
+        [&](const tbb::blocked_range<int64_t> &subrange) {
+          function(IndexRange(subrange.begin(), subrange.size()));
+        });
+    return;
+  }
 #else
  UNUSED_VARS(grain_size);
-  function(range);
 #endif
+  function(range);
 }

 template<typename Value, typename Function, typename Reduction>