Page MenuHome

Crash during render with frame_change_post handler updating materials
Closed, InvalidPublic

Description

System Information
Operating system: Ubuntu 18.04.4 LTS Linux-4.15.0-96-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2) Intel Open Source Technology Center 4.5 (Core Profile) Mesa 19.2.8

Blender Version
Broken: version: 2.90 (sub 3), branch: master, commit date: 2020-05-24 21:18, hash: rB5db2d9c82b7f
Also tested in blender 2.82a on same system, same problem, so it's been in since at least then.

Short description of error
Open the provided blend file, enable running of scripts, press ctrl-F12 to render the animation. After maybe doing some rendering, blender crashes with a segfault. (try a few times if render succeeds at first, crash is non deterministic)
Provided blend file uses cycles, render engine can also be set to eevee or workbench, problem still occurs.
The scene is updated on each frame change by a small python script. In a frame_change_post handler, the script looks for objects with a custom property called 'num' and updates the materials on any child with a name of the form Dot[1-10].<foo> such that one of them has the "on" material, and the others have the "off" material. This script runs fine with seeking through the animation in the timeline view, and doesn't cause the crash. This can be done with the viewport in any mode (rendered or shaded, or something else but the animation is only visible in rendered or shaded modes)

I'm ready to admit that there's a problem in my python script, but if there is, it's not obvious where it is, and it's not clear why it should work in the normal 3d viewport but not during render.

Regardless, It shouldn't be possible to segfault blender from a python script (without trying really hard). If this is not a bug in blender, but is a bug in the py script. then the condition for this segfault should be caught and an error message shown.

Exact steps for others to reproduce the error

  1. Open file
  2. Allow script execution
  3. Render animation

Crash file:

For reference, this is the script:

import bpy
from copy import copy

def handler(scene, deps):
    for obj in deps.objects:        
        eval_obj = obj.evaluated_get(deps)
        if 'num' in eval_obj:
            sobj = scene.objects[eval_obj.name]
            for child in sobj.children:
                if child.name.startswith('Dot'):
                    if len(child.material_slots) < 1:
                        child.data.materials.append(bpy.data.materials.get('off'))
                    if child.name.split('.')[0].endswith(str(eval_obj['num'])):
                        child.material_slots[0].material = bpy.data.materials.get('on')
                    else:
                        child.material_slots[0].material = bpy.data.materials.get('off')
    deps.update()

def register():
    bpy.app.handlers.frame_change_post.append(handler)
    
def unregister():
    for handler in copy(bpy.app.handlers.frame_change_post):
        if handler.__name__ == 'handler':
            bpy.app.handlers.frame_change_post.remove(handler)
    
if __name__ == "__main__":
    unregister()
    register()

Event Timeline

Richard Antalik (ISS) changed the task status from Needs Triage to Confirmed.May 26 2020, 4:57 PM

Can confirm crash, run with different settings and I am getting consistently this stack trace:

Main:

>	[Inline Frame] blender.exe!std::vector<DEG::DepsgraphNodeBuilder::SavedEntryTag,std::allocator<DEG::DepsgraphNodeBuilder::SavedEntryTag>>::emplace_back(const DEG::DepsgraphNodeBuilder::SavedEntryTag &) Line 708	C++
 	[Inline Frame] blender.exe!std::vector<DEG::DepsgraphNodeBuilder::SavedEntryTag,std::allocator<DEG::DepsgraphNodeBuilder::SavedEntryTag>>::push_back(const DEG::DepsgraphNodeBuilder::SavedEntryTag &) Line 717	C++
 	blender.exe!DEG::DepsgraphNodeBuilder::begin_build() Line 349	C++
 	blender.exe!DEG_graph_build_from_view_layer(Depsgraph * graph, Main * bmain, Scene * scene, ViewLayer * view_layer) Line 255	C++
 	blender.exe!scene_graph_update_tagged(Depsgraph * depsgraph, Main * bmain, bool only_if_tagged) Line 1448	C
 	blender.exe!wm_event_do_depsgraph(bContext * C, bool is_after_open_file) Line 359	C
 	blender.exe!wm_event_do_refresh_wm_and_depsgraph(bContext * C) Line 387	C
 	blender.exe!wm_event_do_notifiers(bContext * C) Line 552	C
 	blender.exe!WM_main(bContext * C) Line 481	C
 	blender.exe!main(int argc, const unsigned char * * UNUSED_argv_c) Line 530	C
 	[External Code]

Thread(crashed):
id->cow was nullptr

>	blender.exe!DEG::`anonymous namespace'::graph_id_tag_update_single_flag(Main * bmain, DEG::Depsgraph * graph, ID * id, DEG::IDNode * id_node, IDRecalcFlag tag, DEG::eUpdateSource update_source) Line 398	C++
 	blender.exe!DEG::graph_id_tag_update(Main * bmain, DEG::Depsgraph * graph, ID * id, int flag, DEG::eUpdateSource update_source) Line 680	C++
 	blender.exe!DEG::id_tag_update(Main * bmain, ID * id, int flag, DEG::eUpdateSource update_source) Line 621	C++
 	blender.exe!rna_MaterialSlot_update(Main * bmain, Scene * scene, PointerRNA * ptr) Line 1324	C
 	blender.exe!rna_property_update(bContext * C, Main * bmain, Scene * scene, PointerRNA * ptr, PropertyRNA * prop) Line 2242	C
 	blender.exe!RNA_property_update(bContext * C, PointerRNA * ptr, PropertyRNA * prop) Line 2302	C
 	blender.exe!pyrna_py_to_prop(PointerRNA * ptr, PropertyRNA * prop, void * data, _object * value, const unsigned char * error_prefix) Line 2211	C
 	blender.exe!pyrna_struct_setattro(BPy_StructRNA * self, _object * pyname, _object * value) Line 4524	C
 	[External Code]	
 	blender.exe!bpy_app_generic_callback(Main * UNUSED_main, PointerRNA * * pointers, const int num_pointers, void * arg) Line 344	C
 	blender.exe!BKE_callback_exec(Main * bmain, PointerRNA * * pointers, const int num_pointers, eCbEvent evt) Line 42	C
 	blender.exe!BKE_callback_exec_id_depsgraph(Main * bmain, ID * id, Depsgraph * depsgraph, eCbEvent evt) Line 75	C
 	blender.exe!BKE_scene_graph_update_for_newframe(Depsgraph * depsgraph, Main * bmain) Line 1537	C
 	blender.exe!engine_depsgraph_init(RenderEngine * engine, ViewLayer * view_layer) Line 612	C
 	blender.exe!RE_engine_render(Render * re, int do_all) Line 864	C
 	blender.exe!do_render(Render * re) Line 1235	C
 	blender.exe!do_render_composite(Render * re) Line 1373	C
 	blender.exe!do_render_all_options(Render * re) Line 1635	C
 	blender.exe!RE_RenderAnim(Render * re, Main * bmain, Scene * scene, ViewLayer * single_layer, Object * camera_override, int sfra, int efra, int tfra) Line 2611	C
 	blender.exe!render_startjob(void * rjv, short * stop, short * do_update, float * progress) Line 671	C
 	blender.exe!do_job_thread(void * job_v) Line 398	C
 	[External Code]
Richard Antalik (ISS) renamed this task from Non deterministic segfault during render when a python frame_change_post handler updates materials to Crash during render with frame_change_post handler updating materials.May 26 2020, 5:01 PM
Richard Antalik (ISS) updated the task description. (Show Details)
Sybren A. Stüvel (sybren) claimed this task.

This is a known issue, described in the Application Handlers: Note on Altering Data section of the Python API docs.

In short: there are two dependency graphs being evaluated at the same time (one for the viewport, and the other for the render). Changing Blender data is not thread-safe, hence when that is done from a frame_change_post handler, it can cause problems. The solution is to lock the interface (Render → Lock Interface).

This is a known issue, described in the Application Handlers: Note on Altering Data section of the Python API docs.

In short: there are two dependency graphs being evaluated at the same time (one for the viewport, and the other for the render). Changing Blender data is not thread-safe, hence when that is done from a frame_change_post handler, it can cause problems. The solution is to lock the interface (Render → Lock Interface).

Ah thanks for info, I did not know this. Will keep in mind for future.

I don't think this is an invalid bug report.
Sure it's a known issue, and it's documented, but it's not likely that any user is going to find that documentation upon encountering what appears to be a random segfault.

It shouldn't be possible to segfault blender from within a python script without deliberately trying to do it.

So, I would propose that at least one of the following things happens:

  1. The root cause of this thread-unsafety is eliminated
  2. Attempting to alter data from a handler without the lock interface option set should raise an exception in python with a descriptive error message, so that it's obvious to the python developer what needs to be done to handle the fault, and maybe present an error message to the user explaining to render with the lock interface option
  3. Or, handle the segfault by aborting the python code and presenting an error to the user, or by raising a python exception for the python script to handle.

Ideally I'd like to see option 1 happen, but I have no idea how complicated that solution could be. I'm guessing it would be quite complicated.
Option 2 feels like a much easier one to implement, but could still be tricky. At least it would have the benefit of avoiding crashing the whole of blender, and make it obvious what to do to workaround the problem.
Option 3 should be the easiest of the three, and would go someway towards fixing this problem, but still won't provide obvious error messages. In general though I think this could be helpful for other possible segfaults, suppose raising a "BlenderInternalSegfault" exception (for example) in python land. This would enable some level of recovery from this and other bugs, and make it obvious that it was a problem in the python code that caused the issue, and not a problem in blender. Option 3 could also be implemented even if one of the other two is also implemented.

This is not the first time I've encountered a segfault caused from python code.
I would love to help implementing any of these three solutions, if there is enthusiasm to merge such a fix.

Thoughts?

Thanks.

The root cause of the thread-unsafety is the multi-threaded nature of the dependency graph evaluation. This is necessary to get a decent performance on modern multi-core/multi-threading CPUs. This can be resolved by adding mutexes/locks, but that will reduce Blender's performance. In a similar way, refusing to change data from Python when "Lock Interface" is disabled would require a check on each and every data change. This is also very likely to cause performance issues. The last option also isn't that easy, as it requires clean handling of a segmentation fault (which isn't possible except with rather hacky code). I'm happy to be proven wrong here, though.

@Sybren A. Stüvel (sybren) After a little bit of reading the python C api docs, I've found it to be fairly easy to generate a python backtrace upon a segfault. And given that it's easy to catch a segfault (in linux at least, I haven't tried windows) without doing anything hacky, I have been able to get blender to provide a python backtrace when python code causes a segfault.

https://developer.blender.org/differential/diff/27377/

Would you mind taking a look at this patch and letting me know what you think of it?
It almost certainly doesn't work on windows, so the patch will need some work on that front.

Also, I'm not sure what the normal etiquette is on blender's phabricator is, should I open another ticket for this? (It's related to _this_ issue, but it's also just the addition of a debugging feature)

Previously I would get this output on a segfault:

Writing: /tmp/test_segfault.crash.txt
fish: “../build_linux_debug/bin/blender” terminated by signal SIGSEGV (Address boundary error)
daniel@dan-l13 ~/b/blender (master) [SIGSEGV]>

This patch enables me now to have some clue of where the problem occurred.

Segfault at 0x70 during execution of python code.
This could be either a blender bug or a user-provided python bug.
Stack trace (most recent call first):
  File "/home/daniel/Downloads/test_segfault.blend/test.py", line 16 in handler
Writing: /tmp/test_segfault.crash.txt
fish: “../build_linux_debug/bin/blender” terminated by signal SIGSEGV (Address boundary error)
daniel@dan-l13 ~/b/blender (master) [SIGSEGV]>

Also, I'm not sure what the normal etiquette is on blender's phabricator is, should I open another ticket for this? (It's related to _this_ issue, but it's also just the addition of a debugging feature)

We can discuss the patch at D8457, no need to open another ticket.