Page MenuHome

Fix BLI_task_test deadlock on windows.
ClosedPublic

Authored by LazyDodo (LazyDodo) on May 24 2019, 4:38 AM.

Details

Summary

This patch makes BLI_task_scheduler_create wait for all worker threads to have started before
returning to caller. For very short workloads (BLI_taks_test) there is the chance that the
worker threads have not fully started yet, and the main thread is calling pthread_join at
the same time as pthread_setspecific is being called on the worker threads which causes a
deadlock on pthreads4w.

Diff Detail

Repository
rB Blender
Branch
task_fix (branched from master)
Build Status
Buildable 3733
Build 3733: arc lint + arc unit

Event Timeline

Generally looks fine, however am not sure having the whole condition + mutex is necessary here… I am tempted to just use atomics to increment num_thread_started counter, and loop in BLI_task_scheduler_create() until expected number is reached.

On the other hand, this probably have no real importance (would just make the code a tad smaller), don’t think there is any performance issue here anyway, since we typically only run that code once.

@Sergey Sharybin (sergey) will most certainly want to have a look at that too, btw. ;)

source/blender/blenlib/intern/task.c
216

Would rather see it named num_thread_started e.g. (num_ prefix is mandatory).

Wow, what a catch!

I find it suspicious to use busy-wait while threads are being started. Smells like a straight way to hit some side effects of OS level scheduler. While condition variable seems overkill, it's not that bad.

Just use num_started_threads and then it looks good to me.

Brecht Van Lommel (brecht) added inline comments.
source/blender/blenlib/intern/task.c
439

Shouldn't this only notify all only when scheduler->started_threads == num_threads?

It may not be an issue in practice, but if you've got 64 cores then 64 x 64 notifications sounds like it could be inefficient.

source/blender/blenlib/intern/task.c
439

Why is it 64 x 64 ?

source/blender/blenlib/intern/task.c
439

OK, nevermind, I was confused about what this algorithm was doing.

source/blender/blenlib/intern/task.c
439

Waking up the main thread once is still slightly better I think, but not as much of a problem.

source/blender/blenlib/intern/task.c
439

Agree with @Brecht Van Lommel (brecht), think only notifying when expected number of activated threads is reached sounds nicer, this is cheap change to do, and sounds totally safe to me?

  • Updated with feedback
  • Only signal main thread when all threads have started
  • Cleanup mutex/condition at shutdown

I kept the while loop on the main thread since I saw a note on spurious wake-ups and i'm not entirely sure if that is something I need to worry about.

  • missed rename of variable in feedback.
LazyDodo (LazyDodo) marked 6 inline comments as done.May 24 2019, 2:54 PM
Bastien Montagne (mont29) requested changes to this revision.May 24 2019, 4:19 PM
Bastien Montagne (mont29) added inline comments.
source/blender/blenlib/intern/task.c
438

Eeeek: you may use atomics here to avoid too much locking, but you still need to lock the mutex before notifying the condition!

See https://stackoverflow.com/questions/4544234/calling-pthread-cond-signal-without-locking-mutex

This revision now requires changes to proceed.May 24 2019, 4:19 PM
  • add mutex back in.
LazyDodo (LazyDodo) marked an inline comment as done.EditedMay 24 2019, 4:33 PM
This comment has been deleted.

I'm not sure where this patch is going.

You do need to notify condition variable from inside a mutex-lock. This you can not avoid.

But then why to try being too smart and add extra complexity of atomics? Stay away from them unless they are absolutely required. They aren't as cheap as one might think and wacking them in just because is a bad idea.

Why not just call BLI_condition_notify_all() when num_started_threads reached it's final value, but from commonly used mutex lock section?

Also, if the condition is notified once, you don't need a loop in the main thread.

source/blender/blenlib/intern/task.c
527

Isn't BLI_condition_wait is to be called from a locked mutex situation?

The loop in the main thread is there because this comment scared me.

It's not needed?

Good point.
Then add a comment mentioning that it's needed to deal with possibly spontaneous wake-ups.

source/blender/blenlib/intern/task.c
527

I have honestly no idea, on win32 I would have just called CreateEvent/SetEvent/WaitOne and be done with it, no mutex/condition variable combo wonkyness.

that being said, if the main thread has the mutex, while it's waiting, how do the worker threads ever get it to fire off the signal?

It's pretty clear i'm no pthreads expert. If any of your more qualified pthreads coders wants to take over this diff, i'd be happy to walk away here.

I would just do a simple bare-bones conditional variable approach.

In the main thread:

BLI_mutex_lock(&scheduler->startup_mutex);
/* NOTE: Use loop here to avoid false-positive everything-is-ready caused by spontaneous thread wake up. */
while (scheduler->num_thread_started != num_threads) {
   BLI_condition_wait(&scheduler->startup_cond, &scheduler->startup_mutex);
}

In the worker thead:

BLI_mutex_lock(&scheduler->startup_mutex);
++scheduler->num_thread_started;
if (scheduler->num_thread_started == scheduler->num_threads) {
  BLI_condition_notify_one(&scheduler->startup_cond);
}
BLI_mutex_unlock(&scheduler->startup_mutex);

Simple and clear.

source/blender/blenlib/intern/task.c
527

BLI_condition_wait() will atomically release the mutex and block the calling thread for until condition is notified.

Some details: https://linux.die.net/man/3/pthread_cond_wait

  • hopefully last update with feedback.

@LazyDodo (LazyDodo) the code looks good and follows other patterns of usage of conditional variables in task.c.

However, now when i'm thinking about it, are we doing it correct? Namely, BLI_condition_wait() in loops. Shouldn't existing code which is very close to what's happening in this patch

BLI_mutex_lock(&pool->num_mutex);
while (pool->num) {
  BLI_condition_wait(&pool->num_cond, &pool->num_mutex);
}
BLI_mutex_unlock(&pool->num_mutex);

become

while (pool->num) {
  BLI_mutex_lock(&pool->num_mutex);
  BLI_condition_wait(&pool->num_cond, &pool->num_mutex);
}
BLI_mutex_unlock(&pool->num_mutex);

?

Or, to protect against non-thread-safe access

BLI_mutex_lock(&pool->num_mutex);
while (pool->num) {
  BLI_mutex_unlock(&pool->num_mutex);
  BLI_mutex_lock(&pool->num_mutex);
  BLI_condition_wait(&pool->num_cond, &pool->num_mutex);
}
BLI_mutex_unlock(&pool->num_mutex);

This seems weird though =\

BLI_mutex_lock(&pool->num_mutex);
while (pool->num) {
  BLI_condition_wait(&pool->num_cond, &pool->num_mutex);
}
BLI_mutex_unlock(&pool->num_mutex);

This one is correct. BLI_condition_wait will unlock the mutex, wait, then lock it again before it returns.

@Brecht Van Lommel (brecht), yes. You;re right.

@LazyDodo (LazyDodo) , sorry for the last minute confusion. If you've tested the code you've got green light from me :)

uhhhhh i'd like to use a helpline and page @Bastien Montagne (mont29) / @Brecht Van Lommel (brecht)

also since we're having second thoughts, I kinda feel num_thread_started should be marked volatile now?

also since we're having second thoughts, I kinda feel num_thread_started should be marked volatile now?

Ah, yes. That too :)

  • I really hope this is the last one.
LazyDodo (LazyDodo) added a comment.EditedMay 24 2019, 6:02 PM

since this review was somewhat chaotic, i'll hold off committing till @Bastien Montagne (mont29) has had a final look.

LGTM, besides note inlined in code.


Side note (for those interested in this :P): About volatile: won’t hurt to have it here, just not really useful either. They are useless when using things like mutexes to protect their accesses.

volatile may only be useful in a case where your loop is controlled by an (unprotected) var that may be modified by another thread, and is (nearly) doing nothing, something like int foobar = 0; while (scheduler->num_thread_started != num_threads) { foobar++; }: compiler will probably not be smart enough to detect that scheduler->num_thread_started might be modified elsewhere then, and could generate an infinite loop by reading num_thread_started just once and putting it in a register…

Some interesting docs: https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading and linked pages (and all the comments to the answers!), https://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html (and https://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.html )…

source/blender/blenlib/intern/task.c
438

Really point of detail
I know this can have an effect on non-trivial types in C++, and I know that this file is using both in a rather inconsistent way, but… In most of Blender, we use post-increment (unless pre- one is absolutely needed), and I really find it more readable, especially when applied to members of structs…

532

You forgot to unlock your mutex here (see @Brecht Van Lommel (brecht) comment above: BLI_condition_wait will unlock the mutex, wait, then lock it again before it returns., so you need to unlock it after the end of the loop). ;)

This revision is now accepted and ready to land.May 24 2019, 11:43 PM
  • Update with feedback.
LazyDodo (LazyDodo) marked 2 inline comments as done.May 26 2019, 1:12 AM
This revision was automatically updated to reflect the committed changes.