Page MenuHome

Eevee volume render test memory leak in Mantaflow
Confirmed, NormalPublicBUG

Description

System Information
Operating system: Linux-4.15.0-76-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: Quadro RTX 5000/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 435.21

Blender Version
Broken: version: 2.83 (sub 4), rB5b22713

Exact steps for others to reproduce the error
Running this command will hang, some problem with the Python GIL.

blender -b ../lib/tests/render/volume/smoke_color.blend -E BLENDER_EEVEE -P tests/python/eevee_render_tests.py -f 1

Backtrace:

1Blender 2.83 (sub 4)
2Read prefs: /home/brecht/.config/blender/2.83/config/userpref.blend
3found bundled python: /home/brecht/dev/worktree_build/bin/2.83/python
4Read blend: /home/brecht/dev/lib/tests/render/volume/smoke_color.blend
5Traceback (most recent call last):
6 File "<string>", line 55, in <module>
7NameError: name 'Vec3Grid' is not defined
8Traceback (most recent call last):
9 File "<string>", line 3, in <module>
10NameError: name 'PcMGDynamic' is not defined
11MANTA::updateGridFromFile(): cannot read into uninitialized grid, grid is null
12MANTA::updateGridFromFile(): cannot read into uninitialized grid, grid is null
13^C
14Thread 1 "blender" received signal SIGINT, Interrupt.
150x00007ffff7231f85 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffffffc340, expected=0, futex_word=0xbcd7678 <_PyRuntime+1336>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
16205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
17(gdb) bt
18#0 0x00007ffff7231f85 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffffffc340, expected=0, futex_word=0xbcd7678 <_PyRuntime+1336>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
19#1 __pthread_cond_wait_common (abstime=0x7fffffffc340, mutex=0xbcd7680 <_PyRuntime+1344>, cond=0xbcd7650 <_PyRuntime+1296>) at pthread_cond_wait.c:539
20#2 __pthread_cond_timedwait (cond=0xbcd7650 <_PyRuntime+1296>, mutex=0xbcd7680 <_PyRuntime+1344>, abstime=0x7fffffffc340) at pthread_cond_wait.c:667
21#3 0x000000000560c7c9 in PyCOND_TIMEDWAIT (cond=<optimized out>, mut=<optimized out>, us=<optimized out>) at Python/condvar.h:90
22#4 take_gil (tstate=tstate@entry=0x7fffd2a18b80) at Python/ceval_gil.h:208
23#5 0x000000000560cd2f in PyEval_RestoreThread (tstate=tstate@entry=0x7fffd2a18b80) at Python/ceval.c:271
24#6 0x0000000005639fed in PyGILState_Ensure () at Python/pystate.c:1071
25#7 0x0000000005158e06 in MANTA::runPythonString (this=0xbcd7678 <_PyRuntime+1336>, commands=...) at /home/brecht/dev/worktree/intern/mantaflow/intern/MANTA_main.cpp:528
26#8 0x0000000005159140 in MANTA::~MANTA (this=0x7fff96443400) at /home/brecht/dev/worktree/intern/mantaflow/intern/MANTA_main.cpp:523
27#9 0x00000000051592b9 in MANTA::~MANTA (this=0x7fff96443400) at /home/brecht/dev/worktree/intern/mantaflow/intern/MANTA_main.cpp:508
28#10 0x000000000150bbaf in BKE_fluid_modifier_freeDomain (mmd=0x7fffbb59d3a8) at /home/brecht/dev/worktree/source/blender/blenkernel/intern/fluid.c:4192
29#11 0x000000000150ba85 in BKE_fluid_modifier_free (mmd=0x7fffbb59d3a8) at /home/brecht/dev/worktree/source/blender/blenkernel/intern/fluid.c:4315
30#12 0x00000000013f7a47 in modifier_free_ex (md=0x7fffbb59d3a8, flag=<optimized out>) at /home/brecht/dev/worktree/source/blender/blenkernel/intern/modifier.c:169
31#13 0x000000000140f6aa in BKE_object_free_modifiers (ob=0x7fffd299d208, flag=2) at /home/brecht/dev/worktree/source/blender/blenkernel/intern/object.c:190
32#14 0x0000000001410223 in BKE_object_free (ob=0x7fffd299d208) at /home/brecht/dev/worktree/source/blender/blenkernel/intern/object.c:516
33#15 0x0000000001d6c70e in DEG::deg_free_copy_on_write_datablock (id_cow=0x7fffd299d208) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/eval/deg_eval_copy_on_write.cc:1066
34#16 0x0000000001d736f9 in DEG::IDNode::destroy (this=0x7fffd275fd08) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/node/deg_node_id.cc:165
35#17 0x0000000001d4f1e3 in DEG::Depsgraph::clear_id_nodes_conditional(std::function<bool (ID_Type)> const&) (this=<optimized out>, filter=...) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/depsgraph.cc:156
36#18 DEG::Depsgraph::clear_id_nodes (this=0x7fff88df7488) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/depsgraph.cc:167
37#19 0x0000000001d4efec in DEG::Depsgraph::~Depsgraph (this=0x7fff88df7488) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/depsgraph.cc:93
38#20 0x0000000001d4f689 in OBJECT_GUARDED_DESTRUCTOR<DEG::Depsgraph> (what=0x7fff88df7488) at /home/brecht/dev/worktree/intern/guardedalloc/MEM_guardedalloc.h:264
39#21 DEG_graph_free (graph=0x7fff88df7488) at /home/brecht/dev/worktree/source/blender/depsgraph/intern/depsgraph.cc:297
40#22 0x00000000016dd4fc in EEVEE_lightbake_job_data_free (custom_data=0x7fffd29ef288) at /home/brecht/dev/worktree/source/blender/draw/engines/eevee/eevee_lightcache.c:638
41#23 0x00000000028449c8 in light_cache_bake_exec (C=0x7ffff2efa288, op=0x7fffd294c608) at /home/brecht/dev/worktree/source/blender/editors/render/render_shading.c:928
42#24 0x000000000156be79 in wm_operator_invoke (C=0x7ffff2efa288, ot=0x7fffd284a468, event=0x0, properties=<optimized out>, reports=<optimized out>, poll_only=<optimized out>, use_last_properties=<optimized out>)
43 at /home/brecht/dev/worktree/source/blender/windowmanager/intern/wm_event_system.c:1279
44#25 0x0000000001566ead in wm_operator_call_internal (C=0x7ffff2efa288, ot=0x7fffd284a468, properties=0x7fffffffc840, reports=0x7fffd2a15a18, context=6, poll_only=false, event=0x0) at /home/brecht/dev/worktree/source/blender/windowmanager/intern/wm_event_system.c:1514
45#26 0x00000000015674ed in WM_operator_call_py (C=0x7ffff2efa288, ot=0x7fffd284a468, context=6, properties=0x7fffffffc840, reports=0x7fffd2a15a18, is_undo=false) at /home/brecht/dev/worktree/source/blender/windowmanager/intern/wm_event_system.c:1614
46#27 0x000000000191629f in pyop_call (UNUSED_self=<optimized out>, args=<optimized out>) at /home/brecht/dev/worktree/source/blender/python/intern/bpy_operator.c:268
47#28 0x000000000555971b in _PyMethodDef_RawFastCallKeywords (method=0x98c5800 <bpy_ops_methods+32>, self=<optimized out>, args=<optimized out>, nargs=3, kwnames=0x0) at Objects/call.c:698
48#29 0x0000000005559945 in _PyCFunction_FastCallKeywords (func=0x7fff921393c0, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:734
49#30 0x000000000137a132 in call_function (kwnames=0x0, oparg=3, pp_stack=<synthetic pointer>) at Python/ceval.c:4568
50#31 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3124
51#32 0x000000000560dc5e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fff92165390, globals=globals@entry=0x7fff9214e780, locals=locals@entry=0x0, args=args@entry=0x7fffffffcbd0, argcount=1, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0,
52 kwdefs=0x0, closure=0x0, name=0x7ffff7f7d1b0, qualname=0x7fff920e6030) at Python/ceval.c:3930
53#33 0x000000000555927f in _PyFunction_FastCallDict (func=0x7fff92165e60, args=0x7fffffffcbd0, nargs=<optimized out>, kwargs=0x0) at Objects/call.c:376
54#34 0x000000000555a42c in _PyObject_Call_Prepend (callable=callable@entry=0x7fff92165e60, obj=obj@entry=0x7fff8353fa90, args=args@entry=0x7ffff7f7d050, kwargs=kwargs@entry=0x0) at Objects/call.c:908
55#35 0x00000000055bb4d9 in slot_tp_call (self=0x7fff8353fa90, args=0x7ffff7f7d050, kwds=0x0) at Objects/typeobject.c:6402
56#36 0x0000000005559a23 in _PyObject_FastCallKeywords (callable=0x7fff8353fa90, stack=<optimized out>, nargs=<optimized out>, kwnames=0x0) at Objects/call.c:199
57#37 0x0000000001379aa8 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4619
58#38 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3093
59#39 0x000000000137120f in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:283
60#40 0x00000000055594e6 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
61#41 0x0000000001378cef in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
62#42 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3124
63#43 0x000000000560dc5e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fff837ab6f0, globals=globals@entry=0x7fff901c88c0, locals=locals@entry=0x7fff901c88c0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0,
64 defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:3930
65#44 0x000000000560dd3e in PyEval_EvalCodeEx (_co=_co@entry=0x7fff837ab6f0, globals=globals@entry=0x7fff901c88c0, locals=locals@entry=0x7fff901c88c0, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
66 at Python/ceval.c:3959
67#45 0x000000000560dd6b in PyEval_EvalCode (co=co@entry=0x7fff837ab6f0, globals=globals@entry=0x7fff901c88c0, locals=locals@entry=0x7fff901c88c0) at Python/ceval.c:524
68#46 0x000000000564290a in run_mod (arena=0x7fff902cb2b0, flags=0x0, locals=0x7fff901c88c0, globals=0x7fff901c88c0, filename=0x7fff83792c00, mod=0x7fffbb57d210) at Python/pythonrun.c:1035
69#47 PyRun_FileExFlags (fp=<optimized out>, filename_str=<optimized out>, start=<optimized out>, globals=0x7fff901c88c0, locals=0x7fff901c88c0, closeit=0, flags=0x0) at Python/pythonrun.c:988
70#48 0x0000000001902146 in python_script_exec (C=0x7ffff2efa288, fn=0x7fffffffd4d0 "/home/brecht/dev/worktree/tests/python/eevee_render_tests.py", text=0x0, reports=0x0, do_jump=<optimized out>) at /home/brecht/dev/worktree/source/blender/python/intern/bpy_interface.c:532
71#49 0x000000000137e0df in arg_handle_python_file_run (argc=<optimized out>, argv=0x7fffffffdab0, data=0x7ffff2efa288) at /home/brecht/dev/worktree/source/creator/creator_args.c:1772
72#50 0x0000000006875131 in BLI_argsParse (ba=0x7ffff305f8c8, pass=4, default_cb=0x137e9d0 <arg_handle_load_file>, default_data=0x7ffff2efa288) at /home/brecht/dev/worktree/source/blender/blenlib/intern/BLI_args.c:302
73#51 0x000000000137b915 in main (argc=9, argv=0x7fffffffda88) at /home/brecht/dev/worktree/source/creator/creator.c:483

Event Timeline

Brecht Van Lommel (brecht) created this task.
Brecht Van Lommel (brecht) changed the subtype of this task from "Report" to "Bug".

Alternatively you can enable WITH_OPENGL_RENDER_TESTS and run ctest -R eevee.

Aaron Carlisle (Blendify) changed the task status from Needs Triage to Confirmed.Feb 18 2020, 5:58 PM
Sebastián Barschkis (sebbas) triaged this task as High priority.Feb 19 2020, 11:27 AM

I fixed the hang which was caused by not properly releasing the GIL when a Python error happens.

But that Python error is what really needs to be fixed to solve this.

Brecht Van Lommel (brecht) renamed this task from Eevee volume render test hang in Mantaflow to Eevee volume render test crash in Mantaflow.Mon, Mar 9, 6:45 PM
Sebastián Barschkis (sebbas) lowered the priority of this task from High to Normal.EditedFri, Mar 13, 3:39 PM

93ac4709ebe8 fixes the crash that was happening when light baking was still enabled in the script. However, when light baking is enabled in the script, there is still a memory leak that needs to be fixed.

Once that is out of the way, the temporary workaround f3a33a92987f can be removed too.

Sebastián Barschkis (sebbas) renamed this task from Eevee volume render test crash in Mantaflow to Eevee volume render test memory leak in Mantaflow.Fri, Mar 13, 5:40 PM
Brecht Van Lommel (brecht) reopened this task as Confirmed.Tue, Mar 17, 2:38 PM

This is still crashing for me when doing multiple renders in one command (as the tests do). Example:

blender -b ../lib/tests/render/volume/smoke.blend -E BLENDER_EEVEE -P tests/python/eevee_render_tests.py -f 1 -b ../lib/tests/render/volume/smoke_color.blend -E BLENDER_EEVEE -P tests/python/eevee_render_tests.py -f 1

Interesting, on my OSX machine and with latest master (8cb463f4ff24) I don't get a crash with the example command. Does the crash exit with the same log?

It hits this assert always in debug mode for me, also rendering a single image. Crashing without the assert seems to happenly mostly with multiple renders, but is a bit random.

/home/brecht/dev/build_linux_debug_lite/bin/blender(_start+0x2a) [0x55555d3d782a]
BLI_assert failed: /home/brecht/dev/blender/source/blender/draw/intern/draw_manager_exec.c:970, draw_update_uniforms(), at 'tex'

Thread 1 "blender" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff56d1801 in __GI_abort () at abort.c:79
#2  0x000055555e796104 in draw_update_uniforms (shgroup=0x62d0002308a0, state=0x7fffffff8f10, use_tfeedback=0x7fffffff8e50) at /home/brecht/dev/blender/source/blender/draw/intern/draw_manager_exec.c:970
#3  0x000055555e79d30f in draw_shgroup (shgroup=0x62d0002308a0, pass_state=2147485698) at /home/brecht/dev/blender/source/blender/draw/intern/draw_manager_exec.c:1289
#4  0x000055555e79edc8 in drw_draw_pass_ex (pass=0x62c0000588e8, start_group=0x62d0002308a0, end_group=0x62d0002308a0) at /home/brecht/dev/blender/source/blender/draw/intern/draw_manager_exec.c:1458
#5  0x000055555e79f7f0 in DRW_draw_pass (pass=0x62c0000588e8) at /home/brecht/dev/blender/source/blender/draw/intern/draw_manager_exec.c:1514
#6  0x000055555e84be28 in EEVEE_volumes_compute (sldata=0x618000014488, vedata=0x616000363988) at /home/brecht/dev/blender/source/blender/draw/engines/eevee/eevee_volumes.c:676
#7  0x000055555e818a89 in EEVEE_render_draw (vedata=0x616000363988, engine=0x61800002dc88, rl=0x60d000032578, rect=0x7fffffff94e0)
    at /home/brecht/dev/blender/source/blender/draw/engines/eevee/eevee_render.c:580
#8  0x000055555e7c6dc9 in eevee_render_to_image (vedata=0x616000363988, engine=0x61800002dc88, render_layer=0x60d000032578, rect=0x7fffffff94e0)
    at /home/brecht/dev/blender/source/blender/draw/engines/eevee/eevee_engine.c:429
#9  0x000055555e7744d6 in DRW_render_to_image (engine=0x61800002dc88, depsgraph=0x61800002e088) at /home/brecht/dev/blender/source/blender/draw/intern/draw_manager.c:1817
#10 0x00005555610490df in RE_engine_render (re=0x62200001f908, do_all=0) at /home/brecht/dev/blender/source/blender/render/intern/source/external_engine.c:778

The new call to EEVEE_volumes_free_smoke_textures() at the end of light cache baking seems to be the problem. Maybe it's freeing textures that are still being referenced somewhere? I also find it strange that is called at the end eevee_lightbake_delete_resources, after all the code to dispose or disable the OpenGL context. I tried moving it higher up but it didn't seem to help.

Overall though it's not clear to me what the lifetime of these smoke textures is supposed to be. Image textures have their own garbage collection. For meshes and other geometry there is the batch cache. These smoke textures seems to be freed by workbench and eevee which is yet another way of doing things.

It would make more sense to me if these were somehow part of the batch cache the way it's done for the new volume object.

Ah, now I am also seeing the crash. Before looking for the reason, are we sure that the .blend files are doing what they're supposed to do?
I am only having problems with lib/test/render/volume/smoke.blend and lib/test/render/volume/smoke_fire.blend.

Testing with smoke_color.blend multiple times seems to be fine:

blender -b ../lib/tests/render/volume/smoke_color.blend -E BLENDER_EEVEE -P tests/python/eevee_render_tests.py -f 1 -b ../lib/tests/render/volume/smoke_color.blend -E BLENDER_EEVEE -P tests/python/eevee_render_tests.py -f 1

I think smoke.blend and smoke_fire.blend might need to be remodeled. Aren't their setups supposed to look like smoke_color.blend - just having a different flow types?