Page MenuHome

Dragging in the timeline causes Blender to freeze
Closed, ResolvedPublicBUG

Description

System Information
Operating system: Linux 5.4.0-7626-generic #30~1588169883~20.04~bbe668a-Ubuntu SMP
Graphics card: Quadro RTX 8000

(this happens on a variety of artist workstations, though)

Blender Version
Broken: rBd8a3f3595af
Worked: rBa18ad3c3b61

The artists at Blender Studio seem to have issues with Blender randomly hanging since last friday. It seems to be this commit that causes it: https://developer.blender.org/rBd8a3f3595af0fb3ca5937e41c2728fd750d986ef

The file to reproduce it:

playing back animation or dragging the timeline should cause blender to freeze immediately.

Event Timeline

Able to reproduce, wasn't instant but after 30 seconds it froze on my system

Jeroen Bakker (jbakker) changed the subtype of this task from "Report" to "Bug".
Jeroen Bakker (jbakker) edited projects, added BF Blender (2.90); removed BF Blender.
Sebastian Parborg (zeddb) changed the task status from Needs Triage to Confirmed.Fri, May 8, 4:07 PM
Sebastian Parborg (zeddb) triaged this task as High priority.

I can confirm the freeze on my end. If I open up the file and start playback, it will freeze after a few frames.

@Jeroen Bakker (jbakker) I guess this should be high prio, right?

Ah, I guess we submitted nearly at the same time :P

Also confirmed in Windows 10 RTX2080TI

I try to pause the loop and the editor stops here:

task.h

>	blender.exe!tbb::task::wait_for_all() Line 810	C++
 	blender.exe!tbb::internal::task_group_base::wait() Line 168	C++
 	blender.exe!tbb_task_pool_work_and_wait(TaskPool * pool) Line 249	C++
 	blender.exe!BLI_task_pool_work_and_wait(TaskPool * pool) Line 503	C++
 	blender.exe!DEG::deg_evaluate_on_refresh(DEG::Depsgraph * graph) Line 399	C++
 	blender.exe!DEG_evaluate_on_framechange(Main * bmain, Depsgraph * graph, float ctime) Line 83	C++
 	blender.exe!BKE_scene_graph_update_for_newframe(Depsgraph * depsgraph, Main * bmain) Line 1394	C
 	blender.exe!ED_update_for_newframe(Main * bmain, Depsgraph * depsgraph) Line 1572	C
 	blender.exe!wm_event_do_notifiers(bContext * C) Line 492	C
 	blender.exe!WM_main(bContext * C) Line 456	C
 	blender.exe!main(int argc, const unsigned char * * UNUSED_argv_c) Line 530	C

Seems like a pthread read lock isn't freed making all write locks wait forever when building bvh trees.

https://developer.blender.org/P1377

I will continue to work on it on Monday (if it is still open) @Andy Goralczyk (eyecandy) the current work-a-round IMO is to use b283.

Here's a backtrace from a thread that shows the problem.

There's a dependency graph task here, doing a parallel range within the mutex lock, which causes a dependency graph task to be executed recursively.

Thread 77 (Thread 0x7fffb7e10700 (LWP 38407)):
#0  __lll_lock_wait (futex=futex@entry=0xc74a1a8 <cache_rwlock>, private=0) at lowlevellock.c:52
#1  0x00007ffff7c430a3 in __GI___pthread_mutex_lock (mutex=0xc74a1a8 <cache_rwlock>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000000000e30445 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0d738, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1317
#3  0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0d728, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134
#4  0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99d0e608, ct=0x7fff99d0e548, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021
#5  0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965808, cob=0x7fff99d0e608, targets=0x7fffb7e0d860, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780
#6  BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99d0e608, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847
#7  0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab986808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897
#8  0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffacc95dc8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
--Type <RET> for more, q to quit, c to continue without paging--
#9  DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113
#10 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124
#11 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117
#12 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048
#13 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ()
#14 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
#15 0x0000000000ec0d00 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task*, tbb::task*&) ()
#16 0x00000000076cdc74 in tbb::task::spawn_root_and_wait(tbb::task&) (root=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:798
#17 tbb::interface9::internal::start_for<tbb::blocked_range<int>, RangeTask, tbb::auto_partitioner const>::run(tbb::blocked_range<int> const&, RangeTask const&, tbb::auto_partitioner const&) (range=..., body=..., partitioner=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:95
#18 0x00000000076cd046 in tbb::parallel_for<tbb::blocked_range<int>, RangeTask>(tbb::blocked_range<int> const&, RangeTask const&) (range=..., body=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:201
#19 BLI_task_parallel_range(int, int, void*, TaskParallelRangeFunc, TaskParallelSettings const*) (start=1366, stop=1878, userdata=0x7fffb7e0dd78, func=<optimized out>, settings=0x7fffb7e0ddb0) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_range.cc:127
#20 0x000000000765d1f1 in non_recursive_bvh_div_nodes (tree=0x7fffabeb2208, branches_array=<optimized out>, leafs_array=<optimized out>, num_leafs=<optimized out>) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:848
#21 BLI_bvhtree_balance (tree=0x7fffabeb2208) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:961
#22 0x0000000000e30392 in bvhtree_from_mesh_looptri_create_tree (epsilon=<optimized out>, tree_type=<optimized out>, axis=<optimized out>, vert=0x7fffa1f4f288, mloop=0x7fffa30eae48, looptri=<optimized out>, looptri_num=<optimized out>, looptri_mask=0x0, looptri_num_active=5632) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1072
#23 0x0000000000e300c1 in bvhtree_from_mesh_looptri_ex (data=0x7fffb7e0e1a8, vert=0x7fffa1f4f288, vert_allocated=<optimized out>, mloop=0x7ffff7c4b110 <__lll_lock_wait+48>, loop_allocated=<optimized out>, looptri=0x0, looptri_num=<optimized out>, looptri_allocated=<optimized out>, looptri_mask=<optimized out>, looptri_num_active=<optimized out>, epsilon=0, tree_type=<optimized out>, axis=<optimized out>, bvh_cache_type=<optimized out>, bvh_cache=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1187
#24 0x0000000000e309c9 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0e1a8, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1438
#25 0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0e198, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134
#26 0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99ca2dc8, ct=0x7fff99ca2e88, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021
#27 0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965708, cob=0x7fff99ca2dc8, targets=0x7fffb7e0e2d0, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780
#28 BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99ca2dc8, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847
#29 0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab98f808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897
#30 0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffac7ad7c8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
#31 DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113
#32 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124
#33 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117
#34 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048
#35 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ()
#36 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
#37 0x0000000000ec5188 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) ()
#38 0x0000000000ebdd63 in tbb::internal::market::process(rml::job&) ()
#39 0x0000000000ebf386 in tbb::internal::rml::private_worker::run() ()
#40 0x0000000000ebf5c9 in tbb::internal::rml::private_worker::thread_routine(void*) ()
#41 0x00007ffff7c40609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#42 0x00007ffff7ecb103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Using task isolation will likely have some performance impact, but hopefully it's negligible with the right choice of implementation.

The tricky part is that mutex locking in one part of the code causes problems in other parts of the code quite far away from it. It is hard to tell if a function call from within a mutex locked region uses parallelization or not, and then ensure that assumption remains correct as the code changes.

I can see a few ways of doing this:

  • Use task isolation inside every task being executed. This would have the most overhead for small tasks I expect, likely too much. But it would probably solve the entire problem without changes in other code (except for potential performance issues).
  • A variation of the above, with the ability to turn off task isolation in specific cases where we know there will be no further nested parallelism (e.g. in sculpting). Still would require it to be enabled for all depsgraph tasks for example, which may be too much.
  • Use task isolation in BLI_task_pool_work_and_wait and BLI_task_parallel_range. The downside of this is that any other usage of TBB will not be handled automatically. For example OpenVDB uses TBB, and any OpenVDB processing in a mutex locked region will need code for task isolation.
  • Abstract mutex locking / lazy initialization type code, and make task isolation part of it. This would be most efficient in doing task isolation just when necessary, but it's also easy to forget to use this in all the right places.

I'm testing this locally for the daily rendering/lighting routine and so far it seems to stop Blender from freezing like it's been doing all week. nice!

Based on the Brecht's backtrace I wonder if the following would be a better long-term solution to this particular problem.

It seems, we lock cache_rwlock whenever a new bvh tree is build (e.g. in bvhtree_from_mesh_looptri_ex). So we can only ever build a single bvh tree in parallel when BKE_bvhtree_from_mesh_get is used. I don't see why it has to be this way. It feels like it should be possible to change the locking mechanism so that multiple bvh trees can be build in parallel. Once we have that, the deadlock shown in Brecht's backtrace should never happen.

This is how I think it might work. Not sure how to do that more precisely. I haven't done any thread synchronization for quite a while. All of the locking logic could probably be abstracted way by some nice cache data structure.

def get_bvhtree(mesh):
    lock()
    is_in_cache, bvhtree = find_in_cache(mesh)
    unlock()

    if (is_in_cache): 
        return bvhtree

    lock()
    is_build_by_other, condition = check_if_tree_is_currently_build(mesh)
    if not is_build_by_other:
        condition = register_that_this_this_thread_starts_building_now(mesh)
    unlock()

    if is_build_by_other:
        wait(condition)
        lock()
        bvhtree = find_in_cache(mesh)
        unlock()
        return bvhtree
    else:
        bvhtree = build_bvhtree(mesh)
        lock()
        store_bvhtree_in_cache(mesh, bvhtree)
        unlock()
        notify_all(condition)
        return bvhtree

In general we should probably never do a lot of work within a critical section, which is is exactly what is done in the current implementation.

@Jacques Lucke (JacquesLucke) I agree that kind of setup would be better. But we need to check all mutex locking code for this, task isolation is the safer option until then.

This kind of lazy initialization need an abstraction, to hide the complicated logic. There may also be a way in TBB to make any waiting threads help building the BVH tree.