This is a proposal to bring BMesh in the age of parallelization and multi-tasking. ;)

Authored by Bastien Montagne (mont29) on Nov 22 2017, 10:26 PM.



Add ability to use more than one mempool iterator simultaneously.

This will allow threaded tasks to 'consume' all mempool items in
parallel tasks, each one working on a whole chunk at once (to reduce
concurrency managing overhead).

Add a new parallel looper for MemPool items to BLI_task.

It merely uses the new thread-safe iterators system of mempool, quite
straight forward.

Note that unlike listbase case, here we can loop in a totally lock-free way,
we only need one memory barrier (and only when we have to get a new chuk to
iterate over).

Also added a basic gtest for this new feature.

BMesh: add limited support for parallelization over some basic iterators.

This merely uses new memloop/task looper over vertex/edge/face mempools.

As a demonstration, BM_mesh_normals_update was converted from OMP to
new code, basic test with heavily subdivided cube (24.5k faces) gives:

  • old OMP code: average 10ms per run.
  • new task code: average 6ms per run.

So new code seems to be easily 40% quicker, in addition to getting rid
of OMP. ;)

Diff Detail

rB Blender
631 ↗(On Diff #9591)

yuck, should be return ret; here, will fix.

This revision is now accepted and ready to land.Nov 23 2017, 4:01 AM
199 ↗(On Diff #9591)

This could be a static assert, although we dont have BLI_utildefines from here so no strong opinion.

155 ↗(On Diff #9591)

Could iter be typed? these would be easy to accidentally swap.

1376 ↗(On Diff #9591)

might some functions want to use the thread-id?

199 ↗(On Diff #9591)

Not sure about static assert in C, seems fairly recent and not that much portable yet… We could copy the stuff from BLI_utildefines.h here though, but meh… :/

In anycase, we have those assert all over that atomic_ops code currently, so think this is rather side cleanup work.

155 ↗(On Diff #9591)

Do you mean, something like that?

typedef void * MempoolItemPtr;

typedef void (*TaskParallelMempoolFunc)(void *userdata,
                                        MempoolIterPtr iter);
1376 ↗(On Diff #9591)

Most likely yes, but then I'd rather add an _ex version, with also the per-thread data chuck thingy, etc., when needs arise (as was the case with regular BLI_task_range_parallel one.

199 ↗(On Diff #9591)

In fact, in that 'size of type' check case, we can even do much better, and test this in pre-processor in atomic_ops_utils.h! Will do that now.

This revision was automatically updated to reflect the committed changes.