Page MenuHome

Use mutex for lock in image.c
ClosedPublic

Authored by LazyDodo (LazyDodo) on Sun, Nov 17, 8:16 PM.

Details

Summary

Originally found in looking into a production file I cannot share (synthetic test case atttached though) but the core issue was while rendering, blender stalled on 'updating shaders' for about a minute on this one specific file.

from a CLI render I can confirm the issue

Fra:1 Mem:1852.10M (0.00M, Peak 2153.82M) | Time:00:05.56 | Mem:0.00M, Peak:0.00M | Scene, RenderLayer | Updating Shaders
Fra:1 Mem:8947.51M (0.00M, Peak 8947.54M) | Time:01:11.64 | Mem:0.03M, Peak:0.03M | Scene, RenderLayer | Updating Background

that's 66 seconds

Allright, quick profile, just to see what's going on.

seems reasonably multi threaded, lets see what it is spending time on

Ok...that is *A LOT* of spinning, at this point it was too hard to see which thread was doing work and which one was just spinning so just to see better I replaced the spinlock with a mutex.

Let there be light! Seems like in this job there are a few huge image files of which the biggest takes about 13 seconds to load. (Which we may or may not be able to speed up, but that's a story for a different diff)

Now back in the spinlock version while this file loading one one thread 7 other threads are spinning their little brains out, which would have been ok, if we were the only process on this system.. but we're not, there's many many other processes on a system generally. So what's gonna happen is, the OS scheduler will at random pick a thread to swap out and have the other processes a go, however the OS doesn't discriminate between our loading thread vs our spinners, so at times the only thing executing for blender are spinning threads while our loading threads just sits there suspended. yikes, not good!

So theoretically if we were to put the spinning threads to sleep, we *should* get at least a small speedup since the loading thread would be running more often, wait.. a mutex does exactly that and that's already implemented! checking the render log:

Fra:1 Mem:1852.10M (0.00M, Peak 2153.82M) | Time:00:06.36 | Mem:0.00M, Peak:0.00M | Scene, RenderLayer | Updating Shaders
Fra:1 Mem:8947.51M (0.00M, Peak 8947.54M) | Time:00:42.59 | Mem:0.03M, Peak:0.03M | Scene, RenderLayer | Updating Background

Down to 36 seconds! about half!

Ok, so spinlocks are clearly a bad choice for when one of the threads holds the lock for an extended period of time, they were introduced in rB5c6f6301b02a68c6569e14a70b3968a69fa099e7 which mentioned they were faster than mutexes, but doesn't mention on what workload

So I'm somewhat cautious on blindly replacing the spinlock with a mutex (hence the WIP on this) and left the code in to easily switch between the mutex and spinlock (export IMAGE_MUTEX=1 to test with the mutex codepath) for easy testing.

Diff Detail

Repository
rB Blender

Event Timeline

LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)Sun, Nov 17, 8:19 PM
LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)Sun, Nov 17, 8:59 PM

After having some discussion in blender-chat and having a second look i actually think this is a right way to go: we must not have spin lock while we are reading file from disk.

In majority of other cases the affect of using mutex instead of spin is likely to be neglectable.

The only special case is BKE_image_pool_acquire_ibuf where "heavy" lock is only supposed to happen if there is no entry in the pool yet. Having a second look, we actually don't have any locks for that case, which sounds a bit weird and prone for errors, but this is how code was for years now.

Probably, proper fix for BKE_image_pool_acquire_ibuf would be to use single linked list and use atomic pointer assignment. But this should happen outside of this patch. Just add a TODO/NOTE about it.

So just clean up some debug code and go ahead i'd say.

source/blender/blenkernel/intern/image.c
105

Is it just for the benchmarking purposes?

107

static void image_lock(void)

117

static void image_unlock(void)

183–185

For the final patch we should get rid of this.

LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)

Update with feedback

LazyDodo (LazyDodo) retitled this revision from WIP: Use mutex for lock in image.c to Use mutex for lock in image.c.Mon, Nov 18, 5:45 PM
source/blender/blenkernel/intern/image.c
103
static ThreadMutex *image_mutex;

possibly just a brain fart, but T71510 could be related?

  • update with feedback
LazyDodo (LazyDodo) marked an inline comment as done.Tue, Nov 19, 2:53 PM

Looks good to me now.

Also tested on Linux on E5-2699 v4, got speedup from 26sec to 16sec.

P.S. Just to state the obvious: while .blend file could be considered too synthetic, spin while reading EXR leaves less CPU power to do proper threaded EXR decode. So there is a real benefit for general case as well.

This revision is now accepted and ready to land.Tue, Nov 19, 3:10 PM
This revision was automatically updated to reflect the committed changes.