Page MenuHome

Edit Mesh: Use multithreading in other parts of Auto Merge & Split

Authored by Germano Cavalcante (mano-wii) on Sun, Jan 5, 3:08 PM.



In the Auto Merge & Split feature, multithreading was only used to
find duplicates between vertex and another vertex.

But with this patch, multithreading is now used to find intersections
between edge and edge and between edge and vertex.

In my tests I noticed a performance improvement of around 180%
(0.017151 secs to 0.009373 secs)

Diff Detail

rB Blender

Event Timeline

Bastien Montagne (mont29) requested changes to this revision.Sun, Jan 5, 4:24 PM

Generally looks good, most of those changes are just due to API changes afaict, and other straight-forward refactoring.

But the atomics usage noted in comment below is not acceptable as-is, suggested two directions to go depending on whether edges (tree leaves) actually need thread protection or not…


I cannot really wrap my head around calling code enough to understand whether a same edge (i.e. tree leaf) can be processed several time in different threads… But in any case, this code is not valid as-is, this is a typical pattern for race conditions (since incrementing r_data_cut_edge_len depends on result of previous check in if() clause, there is no guarantee that nothing will happen in-between those two atomics ops, leading e.g. to double-incrementing the counter).

Assuming a given edge can only be processed once, then there is no need to use atomics for operations over edge members, and we are all good (you only need atomic add on r_data_cut_edges_len).

Otherwise, you'll have to use a different approach. E.g. define a specific value for edge->head.index that would work as a 'lock' for that edge (a per-edge spinlock if you want), and using atomic cas inside of a busy loop to set it, change all required values in edge (wit regular operations), and then clear it with a last atomic cas.


thread_num is the worst most confusing possible name here… this computes a number of tasks to be processed in parallel, so should be named something like parallel_tasks_num or something along that idea.

Not to be addressed in that patch of course (as this is used everywhere in that code), but nice for some future cleanup.

This revision now requires changes to proceed.Sun, Jan 5, 4:24 PM
Germano Cavalcante (mano-wii) marked an inline comment as done.
  • Rename thread_num to parallel_tasks_num

e->head.index indicates how many this function has been called for the same edge. (Thus it serves to indicate how many cuts the edge will have).

r_data_cut_edge_len indicates how many different edges have any cut. (This value could also be obtained outside the function by counting how many e->head.index are different from 0).

I liked the idea of using a specific e->head.index value to be used as a 'lock'. e->head.index always starts as 0 so if e->head.index is 1, r_data_cut_edges_len can be incremented.

I still can't see the racing condition, atomic_fetch_and_and_char(& e->head.hflag, ~BM_ELEM_TAG) & BM_ELEM_TAG is only true the first time for each edge. (BM_ELEM_TAG is not added elsewhere).

On second read, I think you are right, there is no race condition here. So patch LGTM. :)


Hrmmmpfff… Actually, I think you are right… getting proper understanding of multi-threaded protection when using atomics is always tricky.

I would add a comment here then, explaining things, something like:

/* Even though we have multiple atomic operations, this is fine here, since there is no dependency on order.
 * And the atomic set + check on e->head.flag bitflag will ever be true once, as expected.
 * We do not care which instance of the code actually ends up incrementing r_data_cut_edge_len, so there is no race condition here. */
This revision is now accepted and ready to land.Sun, Jan 5, 5:50 PM