Dragging in the timeline causes Blender to freeze #76553

Closed
opened 2020-05-08 15:56:57 +02:00 by Andy Goralczyk · 23 comments
Member

System Information
Operating system: Linux 5.4.0-7626-generic #30~1588169883~20.04~bbe668a-Ubuntu SMP
Graphics card: Quadro RTX 8000

(this happens on a variety of artist workstations, though)

Blender Version
Broken: d8a3f3595a
Worked: a18ad3c3b6

The artists at Blender Studio seem to have issues with Blender randomly hanging since last friday. It seems to be this commit that causes it: https://developer.blender.org/rBd8a3f3595af0fb3ca5937e41c2728fd750d986ef

The file to reproduce it: gabby.V2.blend

playing back animation or dragging the timeline should cause blender to freeze immediately.

**System Information** Operating system: Linux 5.4.0-7626-generic #30~1588169883~20.04~bbe668a-Ubuntu SMP Graphics card: Quadro RTX 8000 (this happens on a variety of artist workstations, though) **Blender Version** Broken: d8a3f3595a Worked: a18ad3c3b6 The artists at Blender Studio seem to have issues with Blender randomly hanging since last friday. It seems to be this commit that causes it: https://developer.blender.org/rBd8a3f3595af0fb3ca5937e41c2728fd750d986ef The file to reproduce it: [gabby.V2.blend](https://archive.blender.org/developer/F8520799/gabby.V2.blend) playing back animation or dragging the timeline should cause blender to freeze immediately.
Author
Member

Added subscriber: @eyecandy

Added subscriber: @eyecandy
Member

Added subscriber: @Jeroen-Bakker

Added subscriber: @Jeroen-Bakker
Member

Able to reproduce, wasn't instant but after 30 seconds it froze on my system

Able to reproduce, wasn't instant but after 30 seconds it froze on my system
Jeroen Bakker self-assigned this 2020-05-08 16:07:02 +02:00

Added subscriber: @ZedDB

Added subscriber: @ZedDB

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'

I can confirm the freeze on my end. If I open up the file and start playback, it will freeze after a few frames.

@Jeroen-Bakker I guess this should be high prio, right?

I can confirm the freeze on my end. If I open up the file and start playback, it will freeze after a few frames. @Jeroen-Bakker I guess this should be high prio, right?

Ah, I guess we submitted nearly at the same time :P

Ah, I guess we submitted nearly at the same time :P
Member

Added subscriber: @SimonThommes

Added subscriber: @SimonThommes

Added subscriber: @antoniov

Added subscriber: @antoniov

Also confirmed in Windows 10 RTX2080TI

Also confirmed in Windows 10 RTX2080TI

I try to pause the loop and the editor stops here:

task.h

image.png

>	blender.exe!tbb::task::wait_for_all() Line 810	C++

 	blender.exe!tbb::internal::task_group_base::wait() Line 168	C++
 	blender.exe!tbb_task_pool_work_and_wait(TaskPool * pool) Line 249	C++
 	blender.exe!BLI_task_pool_work_and_wait(TaskPool * pool) Line 503	C++
 	blender.exe!DEG::deg_evaluate_on_refresh(DEG::Depsgraph * graph) Line 399	C++
 	blender.exe!DEG_evaluate_on_framechange(Main * bmain, Depsgraph * graph, float ctime) Line 83	C++
 	blender.exe!BKE_scene_graph_update_for_newframe(Depsgraph * depsgraph, Main * bmain) Line 1394	C
 	blender.exe!ED_update_for_newframe(Main * bmain, Depsgraph * depsgraph) Line 1572	C
 	blender.exe!wm_event_do_notifiers(bContext * C) Line 492	C
 	blender.exe!WM_main(bContext * C) Line 456	C
 	blender.exe!main(int argc, const unsigned char * * UNUSED_argv_c) Line 530	C
I try to pause the loop and the editor stops here: `task.h` ![image.png](https://archive.blender.org/developer/F8520837/image.png) ``` > blender.exe!tbb::task::wait_for_all() Line 810 C++ blender.exe!tbb::internal::task_group_base::wait() Line 168 C++ blender.exe!tbb_task_pool_work_and_wait(TaskPool * pool) Line 249 C++ blender.exe!BLI_task_pool_work_and_wait(TaskPool * pool) Line 503 C++ blender.exe!DEG::deg_evaluate_on_refresh(DEG::Depsgraph * graph) Line 399 C++ blender.exe!DEG_evaluate_on_framechange(Main * bmain, Depsgraph * graph, float ctime) Line 83 C++ blender.exe!BKE_scene_graph_update_for_newframe(Depsgraph * depsgraph, Main * bmain) Line 1394 C blender.exe!ED_update_for_newframe(Main * bmain, Depsgraph * depsgraph) Line 1572 C blender.exe!wm_event_do_notifiers(bContext * C) Line 492 C blender.exe!WM_main(bContext * C) Line 456 C blender.exe!main(int argc, const unsigned char * * UNUSED_argv_c) Line 530 C ```
Member

Seems like a pthread read lock isn't freed making all write locks wait forever when building bvh trees.

https://developer.blender.org/P1377

Seems like a pthread read lock isn't freed making all write locks wait forever when building bvh trees. https://developer.blender.org/P1377
Member

I will continue to work on it on Monday (if it is still open) @eyecandy the current work-a-round IMO is to use b283.

I will continue to work on it on Monday (if it is still open) @eyecandy the current work-a-round IMO is to use b283.

Added subscriber: @brecht

Added subscriber: @brecht

Here's a backtrace from a thread that shows the problem.

There's a dependency graph task here, doing a parallel range within the mutex lock, which causes a dependency graph task to be executed recursively.

Thread 77 (Thread 0x7fffb7e10700 (LWP 38407)):
- 0  __lll_lock_wait (futex=futex@entry=0xc74a1a8 <cache_rwlock>, private=0) at lowlevellock.c:52
- 1  0x00007ffff7c430a3 in __GI___pthread_mutex_lock (mutex=0xc74a1a8 <cache_rwlock>) at ../nptl/pthread_mutex_lock.c:80
- 2  0x0000000000e30445 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0d738, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1317
- 3  0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0d728, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134
- 4  0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99d0e608, ct=0x7fff99d0e548, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021
- 5  0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965808, cob=0x7fff99d0e608, targets=0x7fffb7e0d860, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780
- 6  BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99d0e608, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847
- 7  0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab986808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897
#8  0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffacc95dc8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
--Type <RET> for more, q to quit, c to continue without paging--
- 9  DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113
- 10 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124
- 11 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117
- 12 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048
- 13 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ()
- 14 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
- 15 0x0000000000ec0d00 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task*, tbb::task*&) ()
- 16 0x00000000076cdc74 in tbb::task::spawn_root_and_wait(tbb::task&) (root=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:798
- 17 tbb::interface9::internal::start_for<tbb::blocked_range<int>, RangeTask, tbb::auto_partitioner const>::run(tbb::blocked_range<int> const&, RangeTask const&, tbb::auto_partitioner const&) (range=..., body=..., partitioner=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:95
- 18 0x00000000076cd046 in tbb::parallel_for<tbb::blocked_range<int>, RangeTask>(tbb::blocked_range<int> const&, RangeTask const&) (range=..., body=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:201
- 19 BLI_task_parallel_range(int, int, void*, TaskParallelRangeFunc, TaskParallelSettings const*) (start=1366, stop=1878, userdata=0x7fffb7e0dd78, func=<optimized out>, settings=0x7fffb7e0ddb0) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_range.cc:127
- 20 0x000000000765d1f1 in non_recursive_bvh_div_nodes (tree=0x7fffabeb2208, branches_array=<optimized out>, leafs_array=<optimized out>, num_leafs=<optimized out>) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:848
- 21 BLI_bvhtree_balance (tree=0x7fffabeb2208) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:961
- 22 0x0000000000e30392 in bvhtree_from_mesh_looptri_create_tree (epsilon=<optimized out>, tree_type=<optimized out>, axis=<optimized out>, vert=0x7fffa1f4f288, mloop=0x7fffa30eae48, looptri=<optimized out>, looptri_num=<optimized out>, looptri_mask=0x0, looptri_num_active=5632) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1072
- 23 0x0000000000e300c1 in bvhtree_from_mesh_looptri_ex (data=0x7fffb7e0e1a8, vert=0x7fffa1f4f288, vert_allocated=<optimized out>, mloop=0x7ffff7c4b110 <__lll_lock_wait+48>, loop_allocated=<optimized out>, looptri=0x0, looptri_num=<optimized out>, looptri_allocated=<optimized out>, looptri_mask=<optimized out>, looptri_num_active=<optimized out>, epsilon=0, tree_type=<optimized out>, axis=<optimized out>, bvh_cache_type=<optimized out>, bvh_cache=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1187
- 24 0x0000000000e309c9 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0e1a8, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1438
- 25 0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0e198, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134
- 26 0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99ca2dc8, ct=0x7fff99ca2e88, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021
- 27 0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965708, cob=0x7fff99ca2dc8, targets=0x7fffb7e0e2d0, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780
- 28 BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99ca2dc8, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847
- 29 0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab98f808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897
- 30 0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffac7ad7c8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
- 31 DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113
- 32 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124
- 33 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117
- 34 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048
- 35 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ()
- 36 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
- 37 0x0000000000ec5188 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) ()
- 38 0x0000000000ebdd63 in tbb::internal::market::process(rml::job&) ()
- 39 0x0000000000ebf386 in tbb::internal::rml::private_worker::run() ()
- 40 0x0000000000ebf5c9 in tbb::internal::rml::private_worker::thread_routine(void*) ()
- 41 0x00007ffff7c40609 in start_thread (arg=<optimized out>) at pthread_create.c:477
- 42 0x00007ffff7ecb103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Here's a backtrace from a thread that shows the problem. There's a dependency graph task here, doing a parallel range within the mutex lock, which causes a dependency graph task to be executed recursively. ``` Thread 77 (Thread 0x7fffb7e10700 (LWP 38407)): - 0 __lll_lock_wait (futex=futex@entry=0xc74a1a8 <cache_rwlock>, private=0) at lowlevellock.c:52 - 1 0x00007ffff7c430a3 in __GI___pthread_mutex_lock (mutex=0xc74a1a8 <cache_rwlock>) at ../nptl/pthread_mutex_lock.c:80 - 2 0x0000000000e30445 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0d738, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1317 - 3 0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0d728, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134 - 4 0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99d0e608, ct=0x7fff99d0e548, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021 - 5 0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965808, cob=0x7fff99d0e608, targets=0x7fffb7e0d860, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780 - 6 BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99d0e608, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847 - 7 0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab986808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897 #8 0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffacc95dc8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688 --Type <RET> for more, q to quit, c to continue without paging-- - 9 DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113 - 10 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffacc95d08) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124 - 11 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117 - 12 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048 - 13 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) () - 14 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () - 15 0x0000000000ec0d00 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task*, tbb::task*&) () - 16 0x00000000076cdc74 in tbb::task::spawn_root_and_wait(tbb::task&) (root=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:798 - 17 tbb::interface9::internal::start_for<tbb::blocked_range<int>, RangeTask, tbb::auto_partitioner const>::run(tbb::blocked_range<int> const&, RangeTask const&, tbb::auto_partitioner const&) (range=..., body=..., partitioner=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:95 - 18 0x00000000076cd046 in tbb::parallel_for<tbb::blocked_range<int>, RangeTask>(tbb::blocked_range<int> const&, RangeTask const&) (range=..., body=...) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/parallel_for.h:201 - 19 BLI_task_parallel_range(int, int, void*, TaskParallelRangeFunc, TaskParallelSettings const*) (start=1366, stop=1878, userdata=0x7fffb7e0dd78, func=<optimized out>, settings=0x7fffb7e0ddb0) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_range.cc:127 - 20 0x000000000765d1f1 in non_recursive_bvh_div_nodes (tree=0x7fffabeb2208, branches_array=<optimized out>, leafs_array=<optimized out>, num_leafs=<optimized out>) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:848 - 21 BLI_bvhtree_balance (tree=0x7fffabeb2208) at /home/brecht/dev/blender/source/blender/blenlib/intern/BLI_kdopbvh.c:961 - 22 0x0000000000e30392 in bvhtree_from_mesh_looptri_create_tree (epsilon=<optimized out>, tree_type=<optimized out>, axis=<optimized out>, vert=0x7fffa1f4f288, mloop=0x7fffa30eae48, looptri=<optimized out>, looptri_num=<optimized out>, looptri_mask=0x0, looptri_num_active=5632) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1072 - 23 0x0000000000e300c1 in bvhtree_from_mesh_looptri_ex (data=0x7fffb7e0e1a8, vert=0x7fffa1f4f288, vert_allocated=<optimized out>, mloop=0x7ffff7c4b110 <__lll_lock_wait+48>, loop_allocated=<optimized out>, looptri=0x0, looptri_num=<optimized out>, looptri_allocated=<optimized out>, looptri_mask=<optimized out>, looptri_num_active=<optimized out>, epsilon=0, tree_type=<optimized out>, axis=<optimized out>, bvh_cache_type=<optimized out>, bvh_cache=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1187 - 24 0x0000000000e309c9 in BKE_bvhtree_from_mesh_get (data=0x7fffb7e0e1a8, mesh=0x7fffa8a5bb08, bvh_cache_type=3, tree_type=4) at /home/brecht/dev/blender/source/blender/blenkernel/intern/bvhutils.c:1438 - 25 0x0000000000cb520d in BKE_shrinkwrap_init_tree (data=0x7fffb7e0e198, mesh=0x7fffa8a5bb08, shrinkType=3, shrinkMode=4, force_normals=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/shrinkwrap.c:134 - 26 0x0000000000e4b3d6 in shrinkwrap_get_tarmat (UNUSED_depsgraph=<optimized out>, con=<optimized out>, cob=0x7fff99ca2dc8, ct=0x7fff99ca2e88, UNUSED_ctime=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:4021 - 27 0x0000000000e44242 in BKE_constraint_targets_for_solving_get (depsgraph=0x7fffd5865a08, con=0x7fffab965708, cob=0x7fff99ca2dc8, targets=0x7fffb7e0e2d0, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5780 - 28 BKE_constraints_solve (depsgraph=0x7fffd5865a08, conlist=<optimized out>, cob=0x7fff99ca2dc8, ctime=12) at /home/brecht/dev/blender/source/blender/blenkernel/intern/constraint.c:5847 - 29 0x0000000000e22bc3 in BKE_pose_where_is_bone (depsgraph=0x7fffd5865a08, scene=<optimized out>, ob=0x7fffd58c6e08, pchan=0x7fffab98f808, ctime=0, do_extra=<optimized out>) at /home/brecht/dev/blender/source/blender/blenkernel/intern/armature.c:2897 - 30 0x0000000001772591 in std::function<void (Depsgraph*)>::operator()(Depsgraph*) const (this=0x7fffac7ad7c8, __args=0x7fffd5865a08) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688 - 31 DEG::(anonymous namespace)::evaluate_node(DEG::(anonymous namespace)::DepsgraphEvalState const*, DEG::OperationNode*) (state=<optimized out>, operation_node=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:113 - 32 0x00000000017724ee in DEG::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) (pool=0x7fffacabb108, taskdata=0x7fffac7ad708) at /home/brecht/dev/blender/source/blender/depsgraph/intern/eval/deg_eval.cc:124 - 33 0x00000000076ccf0f in Task::operator()() const (this=0xfffffffffffffe08) at /home/brecht/dev/blender/source/blender/blenlib/intern/task_pool.cc:117 - 34 tbb::internal::function_task<Task>::execute() (this=0xfffffffffffffe00) at /home/brecht/dev/lib/linux_centos7_x86_64/tbb/include/tbb/internal/../task.h:1048 - 35 0x0000000000ec2355 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) () - 36 0x0000000000ec3a25 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () - 37 0x0000000000ec5188 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () - 38 0x0000000000ebdd63 in tbb::internal::market::process(rml::job&) () - 39 0x0000000000ebf386 in tbb::internal::rml::private_worker::run() () - 40 0x0000000000ebf5c9 in tbb::internal::rml::private_worker::thread_routine(void*) () - 41 0x00007ffff7c40609 in start_thread (arg=<optimized out>) at pthread_create.c:477 - 42 0x00007ffff7ecb103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ```

Task isolation might be a solution:
https://software.intel.com/en-us/node/684814

Task isolation might be a solution: https://software.intel.com/en-us/node/684814

Using task isolation will likely have some performance impact, but hopefully it's negligible with the right choice of implementation.

The tricky part is that mutex locking in one part of the code causes problems in other parts of the code quite far away from it. It is hard to tell if a function call from within a mutex locked region uses parallelization or not, and then ensure that assumption remains correct as the code changes.

I can see a few ways of doing this:

  • Use task isolation inside every task being executed. This would have the most overhead for small tasks I expect, likely too much. But it would probably solve the entire problem without changes in other code (except for potential performance issues).
  • A variation of the above, with the ability to turn off task isolation in specific cases where we know there will be no further nested parallelism (e.g. in sculpting). Still would require it to be enabled for all depsgraph tasks for example, which may be too much.
  • Use task isolation in BLI_task_pool_work_and_wait and BLI_task_parallel_range. The downside of this is that any other usage of TBB will not be handled automatically. For example OpenVDB uses TBB, and any OpenVDB processing in a mutex locked region will need code for task isolation.
  • Abstract mutex locking / lazy initialization type code, and make task isolation part of it. This would be most efficient in doing task isolation just when necessary, but it's also easy to forget to use this in all the right places.
Using task isolation will likely have some performance impact, but hopefully it's negligible with the right choice of implementation. The tricky part is that mutex locking in one part of the code causes problems in other parts of the code quite far away from it. It is hard to tell if a function call from within a mutex locked region uses parallelization or not, and then ensure that assumption remains correct as the code changes. I can see a few ways of doing this: * Use task isolation inside every task being executed. This would have the most overhead for small tasks I expect, likely too much. But it would probably solve the entire problem without changes in other code (except for potential performance issues). * A variation of the above, with the ability to turn off task isolation in specific cases where we know there will be no further nested parallelism (e.g. in sculpting). Still would require it to be enabled for all depsgraph tasks for example, which may be too much. * Use task isolation in `BLI_task_pool_work_and_wait` and `BLI_task_parallel_range`. The downside of this is that any other usage of TBB will not be handled automatically. For example OpenVDB uses TBB, and any OpenVDB processing in a mutex locked region will need code for task isolation. * Abstract mutex locking / lazy initialization type code, and make task isolation part of it. This would be most efficient in doing task isolation just when necessary, but it's also easy to forget to use this in all the right places.

This issue was referenced by 08ac4d3d71

This issue was referenced by 08ac4d3d71dee9fc4ec7f878e57de59c87115280
Author
Member

I'm testing this locally for the daily rendering/lighting routine and so far it seems to stop Blender from freezing like it's been doing all week. nice!

I'm testing this locally for the daily rendering/lighting routine and so far it seems to stop Blender from freezing like it's been doing all week. nice!
Member

Added subscriber: @JacquesLucke

Added subscriber: @JacquesLucke
Member

Based on the Brecht's backtrace I wonder if the following would be a better long-term solution to this particular problem.

It seems, we lock cache_rwlock whenever a new bvh tree is build (e.g. in bvhtree_from_mesh_looptri_ex). So we can only ever build a single bvh tree in parallel when BKE_bvhtree_from_mesh_get is used. I don't see why it has to be this way. It feels like it should be possible to change the locking mechanism so that multiple bvh trees can be build in parallel. Once we have that, the deadlock shown in Brecht's backtrace should never happen.

This is how I think it might work. Not sure how to do that more precisely. I haven't done any thread synchronization for quite a while. All of the locking logic could probably be abstracted way by some nice cache data structure.

lang=python
def get_bvhtree(mesh):
    lock()
    is_in_cache, bvhtree = find_in_cache(mesh)
    unlock()

    if (is_in_cache): 
        return bvhtree

    lock()
    is_build_by_other, condition = check_if_tree_is_currently_build(mesh)
    if not is_build_by_other:
        condition = register_that_this_this_thread_starts_building_now(mesh)
    unlock()

    if is_build_by_other:
        wait(condition)
        lock()
        bvhtree = find_in_cache(mesh)
        unlock()
        return bvhtree
    else:
        bvhtree = build_bvhtree(mesh)
        lock()
        store_bvhtree_in_cache(mesh, bvhtree)
        unlock()
        notify_all(condition)
        return bvhtree

In general we should probably never do a lot of work within a critical section, which is is exactly what is done in the current implementation.

Based on the Brecht's backtrace I wonder if the following would be a better long-term solution to this particular problem. It seems, we lock `cache_rwlock` whenever a new bvh tree is build (e.g. in `bvhtree_from_mesh_looptri_ex`). So we can only ever build a single bvh tree in parallel when `BKE_bvhtree_from_mesh_get` is used. I don't see why it has to be this way. It feels like it should be possible to change the locking mechanism so that multiple bvh trees can be build in parallel. Once we have that, the deadlock shown in Brecht's backtrace should never happen. This is how I think it might work. Not sure how to do that more precisely. I haven't done any thread synchronization for quite a while. All of the locking logic could probably be abstracted way by some nice cache data structure. ``` lang=python def get_bvhtree(mesh): lock() is_in_cache, bvhtree = find_in_cache(mesh) unlock() if (is_in_cache): return bvhtree lock() is_build_by_other, condition = check_if_tree_is_currently_build(mesh) if not is_build_by_other: condition = register_that_this_this_thread_starts_building_now(mesh) unlock() if is_build_by_other: wait(condition) lock() bvhtree = find_in_cache(mesh) unlock() return bvhtree else: bvhtree = build_bvhtree(mesh) lock() store_bvhtree_in_cache(mesh, bvhtree) unlock() notify_all(condition) return bvhtree ``` In general we should probably never do a lot of work within a critical section, which is is exactly what is done in the current implementation.

@JacquesLucke I agree that kind of setup would be better. But we need to check all mutex locking code for this, task isolation is the safer option until then.

This kind of lazy initialization need an abstraction, to hide the complicated logic. There may also be a way in TBB to make any waiting threads help building the BVH tree.

@JacquesLucke I agree that kind of setup would be better. But we need to check all mutex locking code for this, task isolation is the safer option until then. This kind of lazy initialization need an abstraction, to hide the complicated logic. There may also be a way in TBB to make any waiting threads help building the BVH tree.
Member

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Thomas Dinges added this to the 2.90 milestone 2023-02-08 16:27:21 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
8 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#76553
No description provided.