Page MenuHome

BLI_task: Initial implementation of pooled threaded index range iterator.
Needs ReviewPublic

Authored by Bastien Montagne (mont29) on Tue, Nov 5, 12:31 PM.



This code allows to push a set of different operations all based on
iterations over a range of indices, and then process them all at once
over multiple threads.

This is mainly interesting for relatively low amount of individual
tasks, as expected.

E.g. performance tests on a 32 threads machine, for a set of 10
different tasks, shows following improvements when using pooled version
instead of ten sequential calls to BLI_task_parallel_range():

Num ItemsSequentialPooledSpeed-up
10K365 us138 us2.5 x
100K877 us530 us1.66 x
1000K5521 us4625 us1.25 x

Diff Detail

rB Blender
tmp-task-foreach-pool (branched from master)
Build Status
Buildable 5627
Build 5627: arc lint + arc unit

Event Timeline

Updated against latest master.

It seems that i am actually having 2x slowdown from this change.

I've applied this patch on top of 2412451595c, and then P1151 (to make it so averaged time is printed, also forced to use minimal workload per thread of 1, since every task is quite heavy).
With the attached .blend file I've got about 0.34 sec in average.

Then I've applied P1152 and run the numbers again. This time average was about 0.6sec.

The worst case of such a change I would expect a no performance gain, but not a performance loss.

@Sergey Sharybin (sergey) Checked this morning, and your use-case is actually the worst, useless one ever to check that feature, since you have few very heavy tasks (not even 10K in total, taking over 1.5sec on a single thread...). Current code is fairly non-optimal here btw, should use dynamic scheduling, not static one.

Double of the time in your example is due to static scheduling, and 'pooled' task computing chunk sizes based on summed range lengths, which gives even bigger chunks - this is bad idea, think in case of static scheduling I'' compute chunks based on only the smallest range, that will give it a taste of dynamic scheduling, but will make more sense since pooled range iterator is already inherently non-static anyway.

  • Merge branch 'master' into tmp-task-foreach-pool
  • Pool range iter: compute 'static' scheduler chunk size from smallest range.