Originally found in looking into a production file I cannot share (synthetic test case atttached though) but the core issue was while rendering, blender stalled on 'updating shaders' for about a minute on this one specific file.
from a CLI render I can confirm the issue
Fra:1 Mem:1852.10M (0.00M, Peak 2153.82M) | Time:00:05.56 | Mem:0.00M, Peak:0.00M | Scene, RenderLayer | Updating Shaders Fra:1 Mem:8947.51M (0.00M, Peak 8947.54M) | Time:01:11.64 | Mem:0.03M, Peak:0.03M | Scene, RenderLayer | Updating Background
that's 66 seconds
Allright, quick profile, just to see what's going on.
seems reasonably multi threaded, lets see what it is spending time on
Ok...that is *A LOT* of spinning, at this point it was too hard to see which thread was doing work and which one was just spinning so just to see better I replaced the spinlock with a mutex.
Let there be light! Seems like in this job there are a few huge image files of which the biggest takes about 13 seconds to load. (Which we may or may not be able to speed up, but that's a story for a different diff)
Now back in the spinlock version while this file loading one one thread 7 other threads are spinning their little brains out, which would have been ok, if we were the only process on this system.. but we're not, there's many many other processes on a system generally. So what's gonna happen is, the OS scheduler will at random pick a thread to swap out and have the other processes a go, however the OS doesn't discriminate between our loading thread vs our spinners, so at times the only thing executing for blender are spinning threads while our loading threads just sits there suspended. yikes, not good!
So theoretically if we were to put the spinning threads to sleep, we *should* get at least a small speedup since the loading thread would be running more often, wait.. a mutex does exactly that and that's already implemented! checking the render log:
Fra:1 Mem:1852.10M (0.00M, Peak 2153.82M) | Time:00:06.36 | Mem:0.00M, Peak:0.00M | Scene, RenderLayer | Updating Shaders Fra:1 Mem:8947.51M (0.00M, Peak 8947.54M) | Time:00:42.59 | Mem:0.03M, Peak:0.03M | Scene, RenderLayer | Updating Background
Down to 36 seconds! about half!
Ok, so spinlocks are clearly a bad choice for when one of the threads holds the lock for an extended period of time, they were introduced in rB5c6f6301b02a68c6569e14a70b3968a69fa099e7 which mentioned they were faster than mutexes, but doesn't mention on what workload
So I'm somewhat cautious on blindly replacing the spinlock with a mutex (hence the WIP on this) and left the code in to easily switch between the mutex and spinlock (export IMAGE_MUTEX=1 to test with the mutex codepath) for easy testing.