Regression: BLI_findstring encountering segfault when script calls bpy.ops.wm.save_as_mainfile #97627
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
7 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#97627
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
system-info.txt
Blender Version
Broken: (Blender 3.2,
2882cbe685
, Debug, 24th April 2022, built from source on macOS 10.15.7 using VSCode)Worked: (Blender 2.93 LTS,
31712ce77a
, master, 19 April 2022)Caused by
7a9cfd08a8
Segmentation fault when saving a file using a script rather than the UI, also strange visual glitches, for a recording of the visual glitching, see {F13029978}
Exact steps for others to reproduce the error
Note:
Crash occurs using any blend file including the default scene.
Crash seems to be limited to MacOS only
To reproduce
Investigations done
I have looked into the problem, since the integration of the Cycles X branch, or thereabouts, a new feature came to Blender that allowed the user to save a thumbnail of their Blender UI at the time of saving their blend file and associate it with that particular project. This allowed the thumbnail to be displayed by the OS so the user could search for a project visually. However it also introduced a design flaw (in my opinion).
The problem is that the UI has to be re-drawn to close any open windows, most obviously the File Menu since the user would have to had opened the File menu to reach the save file menu. Also the save file dialog would need to be closed so that the thumbnail did not contain these elements as they would overlap much of the UI, potentially obscuring the recognisable feature of the project and rendering the feature far less useful.
The real issue, however, stems from how the feature has been implemented. In order to call save from a script, external devs can use the bpy.ops.wm.save_xxxxxx operators. However, the save_as_mainfile and save_mainfile versions both trigger a redraw of the UI. Even when called via a script which doesn't have any UI elements open. It is this calling of the redraw of the UI as a side effect of saving a copy of a blend file from a script that causes the crash. Avoiding doing the redraw, avoids the crash.
This wouldn't be the first time this feature has caused similar issues, in [#92704 ]] I had discovered this problem happens when attempting to use bpy.ops.wm.save_xxxxxxx from a thread that is not the main thread. The solution was [[ https:*developer.blender.org/D13140 | D13140 ](https:*developer.blender.org/T92704) which was accepted and is now in master.
Can I say I think it was a mistake to accept that solution? Based on what I'm experiencing now, Blender is randomly crashing even when calling bpy.ops.wm.save_xxxxx from the main thread. I have tried to offer a solution in [D14160 ](https://developer.blender.org/D14160), however I've been asked for a backtrace. soooo
Crash dumps
This crash dump, generated by macOS after blender crashed is typical, it shows the code path from redrawing the UI right down to where the segfault happens - Minimal_test_case_crashdump1.rtf
This crash dump is a stranger version, I don't recognise the code path at all, but it happened on one of many test runs - Minimal_test_case_crashdump2.rtf
The vast majority of crashes are of the first type where the problem is in BLI_findstring, which I have to say is a confusing method at first. There are no comments and one has to really strive to understand what is going on. It certainly doesn't return a string, but rather seems to use them to locate a struct/data block given the input. I've built blender from source on macOS and the debugger has paused on the exception.
The problem appears to be the id_iter variable, it's pointer is invalid, but not equal to NULL or nullptr. It has a non-zero value which when dereferenced either using the * operator or by using indexing, causes a segmentation fault.
It is very hard for me to offer any explanation of why though. I have looked at whether it is my implementation in python of the UI list, since it is on the UI list where BLI_findstring crashes. But I am no expert and the code is quite opaque to me. I've also used the examples in the blender python API for UI lists to create the minimal test case, which reproduces the crash. I would be so happy if this were a known limitation (or an as yet unknown limitation that could be documented, pretty please wth sugar on top for also providing a workaround?), but I doubt it, that's opinion though.
What to do about this?
My suggestion would be to ensure that it is possible to avoid unnecessary changes to the UI when saving a file from a script. The proper separation of concerns principle makes sense in this use case where scripts want to save a copy of the blend file with no unintended side effects, including unnecessary redraw's of the UI and especially avoiding associated instability. Though I go agree with Brecht that the underlying cause should be fixed, it would serve in the meantime to not force redraws unless the redraw is being done by the user, from the UI, and have a separate code path for scripts.
I've already provided one example of how we could change the call signature of the bpy.ops.wm.save_as_mainfile operator to allow a boolean flag that can be used to avoid redrawing the UI. I am open to other suggestions on how to ensure separation of concerns for this use case other than those in [D14160 ](https://developer.blender.org/D14160).
Added subscriber: @JamesCrowther
Added subscriber: @rjg
I can reliably reproduce this issue with an ASAN build:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
Changed status from 'Needs Triage' to: 'Confirmed'
Edit: The
®ion->ui_lists.first
and®ion->ui_lists.last
arenullptr
inui_list_ensure()
.Indeed tracked it down to this 🙏
it is null before the button is clicked indeed
but it is worse, it gets freed before usage
it seems that region gets freed by
BKE_area_region_free
beforeui_list_ensure
gets calledthese are the value of ui_lists pointers when this happens in debug build (the kill switch ok button is clicked, memory is freed by
BKE_area_region_free
, then accessed after free inui_list_ensure
)discussing with Jamesy from Crowd Render on Discord
@rjg
this is the call stack right after clicking the button, it calls free, then afterwards
ui_list_ensure
accesses that freed memoryit seems that closing the pop up leads to free, but then the operator is executed, which leads to access after free
Looks like the pop-up handler is incorrectly freeing the template list? The preceding stack frames show some operations related to a "popup" which I assume to be the props dialog that we use (both in our addon and I included it in the minimal test case script).
Willing to bet that if I remove the pop-up, the crash will disappear. Going to try that and report back. I'll modify the script so that we don't generate a props dialog and save the blend file directly from a button located in the render properties panel.
Confirmed! If I modify the script to allow the bpy.ops.wm.save_as_mainfile operator to be called from a button located within a sub panel in the render properties area of the Blender UI, I can't crash blender no matter how hard I try. So it seems the popup handler is somehow freeing the UI list incorrectly?! The UI list is instantiated in the panel, not the popup so could maybe @rjg let us know why this is happening? I would imagine it makes sense to free the popup, but not the UI_List from which the button was pressed to generate the popup! Its still supposed to be there after the popup is gone!
Added subscriber: @lichtwerk
Marking as High prio since it is a regression.
BLI_findstring encountering segfault when script calls bpy.ops.wm.save_as_mainfileto Regression: BLI_findstring encountering segfault when script calls bpy.ops.wm.save_as_mainfile@IyadAhmed Yes. I thought that was obvious from the ASAN report, hence I didn't point that out specifically. The allocation is done by
ui_region_temp_add
andui_region_temp_remove
frees it, before the call toui_list_ensure
happens.Added subscribers: @Harley, @ideasman42
Looked into this it looks to be caused by
7a9cfd08a8
(CC'ing @Harley).I think it's only practical to redraw when this operator is called in the main event loop.
Suggest to revert, longer term we could have a way for operators to detect if they're running in the main event loop.
Added subscriber: @IyadAhmed
Will do.
Ah ok
This issue was referenced by
c6ce2be496
Changed status from 'Confirmed' to: 'Resolved'