Crash when changing torus properties #80203
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
18 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#80203
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Darwin-19.6.0-x86_64-i386-64bit 64 Bits (macOS Catalina) 10.15.6
Graphics card: NVIDIA GeForce GT 650M OpenGL Engine NVIDIA Corporation 4.1 NVIDIA-14.0.32 355.11.11.10.10.143
Blender Version
Broken: version: 2.90.0 Beta, branch: master, commit date: 2020-08-28 14:51, hash:
ddbf41d88d
Worked: -
Short description of error
When changing properties of a torus mesh blender crash.
I recorded a video:
https://imgur.com/a/ghQjmGG
Exact steps for others to reproduce the error
Disable undo legacy. (it doesn't crash there)
On release/relwithdebinfo build, Default scene > add torus > move one radius slider, then ferociously move the other radius slider.
If the crash doesn't occur, add cone, do the same with its radius sliders.
Added subscriber: @giveuptheghost
#84387 was marked as duplicate of this issue
#84368 was marked as duplicate of this issue
#84015 was marked as duplicate of this issue
#81516 was marked as duplicate of this issue
#80857 was marked as duplicate of this issue
Added subscriber: @HooglyBoogly
Thanks for the report! Although I can't reproduce this here. (Fedora 32 Linux). I wonder if this is a macOS only issue.
Added subscriber: @jenkm
2.81.16 - OK
2.82.7 - OK
2.83.2 - Crash
2.91.0 - Crash
No crash with "Undo Legacy".
(macOS)
Added subscriber: @iss
No crash on windows either, so I will add mac tag
Added subscriber: @girafic
Also crashes here with Blender 2.83.5
torus_crash.mp4
System Information
Operating system: Darwin-19.5.0-x86_64-i386-64bit 64 Bits
Graphics card: AMD Radeon RX 480 OpenGL Engine ATI Technologies Inc. 4.1 ATI-3.9.15
Blender Version
Broken: version: 2.83.5, branch: master, commit date: 2020-08-19 06:07, hash:
c2b144df39
Now it's working fine for me in the 2.90.0 release
That's quite strange (at least when I look at backtrace)
I see only 2/3 new functional commits in log since ddbf41d88d:
0330d1af29
cb0b0416f4
defe21a7bb
@jenkm can you check if this still crashes?
2.90.0 release still crash.
Added subscribers: @jakubsz.123, @rjg
Looks like the bug is fixed with 2.90.1 (branch: master, commit date: 2020-09-23 06:43, hash:
3e85bb34d0
)Changed status from 'Needs Triage' to: 'Archived'
Thanks for update.
It still crashes for me. (Although it seems to have become harder to reproduce.)
9f9dbaf22b
2020-10-01 10:13Hmm, now this happens for other objects (cone, sphere) as well.
Changed status from 'Archived' to: 'Needs Triage'
@jenkm Thanks for checking this. I still can't reproduce. Perhaps with asan I would be able to. Will check again.
Edit: I must do something wrong, because I wasn't able to run Blender with ASAN, will look into that later.
onlyeasier for the Torus, while 2.91 is easier to crash with a Cone by changing the Radius.And the most interesting thing, when I drag the value the cursor disappears (since Continuous Grab),
but this doesn't happen in the Redo panel for Add object. (maybe it means something)
Added subscriber: @ankitm
Changed status from 'Needs Triage' to: 'Confirmed'
Can confirm on Release and RelWithDebInfo build.
63c906e0a7
No config built with ASan crashes.
Added subscriber: @ani528123
Added subscriber: @mont29
Changed status from 'Confirmed' to: 'Needs User Info'
Can you please check whether it still crashes for you if you disable the new undo system? In user preferences, Experimental section, Debugging panel, enable the Undo Legacy checkbox.
I tried with 2.90 and 2.91 Alpha with legacy undo enabled. 20 minutes playing with diferent properties of diferent meshes and no crashes
Thanks, then it is probably some kind of memory corruption... Will see if I can reproduce.
Changed status from 'Needs User Info' to: 'Needs Developer To Reproduce'
Tried to reproduce the issue here on linux for quiet some time, with release and debug+Asan builds, with one or a thousands of objects already in scene... Not a single crash. :(
@ankitm since you can reproduce am afraid this is on your desk for now?
Enabling Asan makes the crash disappear for me too, no matter what config.
Did you forget to disable legacy undo ?
I had tried to get trace the same day I confirmed it (relwithdebinfo), but the crash was happening at too many places, no consistent stacktrace. In some places, even
C
wasNULL
. Is that normal ?note to self: https://stackoverflow.com/questions/14045208/how-to-set-a-breakpoint-in-malloc-error-break-to-debug/
@ankitm I obviously had legacy undo disabled! Crash just does not happen on linux apparently...
Got lucky and Debug build (without asan) also crashes. Four attempts, four traces. https://developer.blender.org/P1709
Then launched Blender without debugger and got this crash report: https://developer.blender.org/P1710 Line 7 sticks out.. is that the bug ?
Got its malloc history and corresponding --debug-all event log too. At the end of the latter are the malloc messages about corruption.
T80203_malloc_history_prettyprint.txt.xz T80203_debug_all.txt
Hmmmm those traces remind me a lot about that other issue: #74067 (Crash: Custom Panel UI Toggle while Undoing)
This only happening with new undo (and never with ASAN build) could be explained by the fact that the operation triggering that out-of-sync situation in UI has to be very fast, otherwise UI will get updated in-between and issue will not happen?
Anyway, can you try and see if D7795: Fix #74067: Crash when the UI accesses stale data fixes it for you? Hopefully this is still applicable easily on current master.
With D7795 rebased on master
bb872b25f2
, crash happens,Crash report/ trace being: https://developer.blender.org/P1709#8866 and there's this black screen on the panel too in this crash. {F8999635, layout=inline}
On another attempt: https://developer.blender.org/P1709#8867
yeah, I move the cursor from one corner of the screen to the other really fast.
With the same patch (as in my last comment), and running Blender without debugger, https://developer.blender.org/P1710#8869
see the lines which wrongly set the property which is intended for the radii.
One more thing @jenkm also mentioned above:
with legacy undo, the cursor disappears while moving the slider.
with new undo, the cursor moves on the screen while moving the slider, and returns back to its original state (on the slider) when released.
Ignore if it's intended.
Don't really know what to add here... This is likely some staled data somewhere in UI code, but besides that, being unable to reproduce...
Added subscriber: @sebbas
@sebbas Can you please have a look at this, see if you can reproduce and then investigate it? I cannot do anything about it here on linux it'd seem, but this is a rather annoying issue still, and the bug is unlikely to actually be mac-only... Thanks. :)
@mont29 Yes, will take look!
Added subscriber: @Memento
On an old MacBook Pro running 10.15.6 I can reproduce the problem with 2.91.0 Beta, branch: master, commit date: 2020-11-13 00:36, hash:
2e08500d04
, Blender will completely crash and disappear, leading to the macOS its "has unexpectedly crashed" message.Taking todays (November 17th) version
906ff7b8fe
and doing the same steps, it will freeze Blender completely, causing the mouse cursor to display Apple its famous "Beachball of Death". However I was able to limp back into the Finder and launch a Terminal from there and could see using "top", that Blender is still running at a 100% CPU usage, however moving the mouse back into the Blender window the beachball of death is still there and not responding to any clicks or keypresses. Also the mouse cursor remains completely hidden.Added subscriber: @JulianEisel
This definitely seems like some sort of memory error. Most likely related to undo/redo.
Each time the crash log is entirely different. Once in the depsgraph code, once in drawing, once in the UI code... Oh and once, every change to the radius value even created a new torus object (as seen in the Outliner), no other weird things and no crash happened.
That is with recent master.
--debug-memory
doesn't make a difference. So it doesn't seem like a use-after-free, more like an invalid write or so.Apparently Valgrind isn't available for modern macOS? A memory tool like this would be useful.
I was going to do a bisect (from 2.82 to 2.83), but realized it will likely just end up being the new undo system merge or its enable-by-default commit. If somebody still wants to give it a go, by all means, feel welcome.
So I guess we need to give @mont29 access to a Mac to debug this.
Or maybe somebody can reproduce on Linux+Clang?
I don't see a reason to believe this is caused by stall UI data. Just because some of the seemingly random logs show this. But after redo is performed, most code that runs is UI code, so of course the logs after an invalid memory operation on undo/redo are biased towards that. And if you move the mouse faster, more value changes happen and it's more likely to fail in that period - I was able to reproduce it with slow mouse movements as well.
Not a fun one at all!
Even if it was available on our macOS versions, it will introduce slowdown and that would make the bug go away. That's why we couldn't' redo it with Asan or debug + Asan builds.
I can reliably reproduce the crash with other mesh objects as well. E.g. with cubes and spheres.
However I can not reproduce it with non-mesh objects, e.g. curves, lamps and meta-balls.
I'm not at all convinced that it's the overhead of ASan that makes it go away. There are many more variables, AFAIK ASan has an own allocator.
Like I said I was able to reproduce it with really slow movements. Based on how mouse moves are handled in the main-loop (we drop all mouse moves but the most recent one), I kinda doubt that these could lead to a data-race.
Added subscriber: @Walles
Two suggestions that I was unable to find in the comments so far, based on that this looks like a race condition.
Helgrind on Linux
https://www.valgrind.org/docs/manual/hg-manual.html
Helgrind will sometimes report failing synchronization even when they don't lead to crashes, so this would be my suggestion for any Valgrind savvy on-Linux developers who want to track this down.
And it might be that this problem exists on Linux as well, but it just blows up a lot less often. In that case Helgrind might help.
I spent some time a number of years ago fixing threading issues in a largeish piece of code and Helgrind was gold! Super slow to execute but back then I got good diagnostics out of it and was able to fix some really non-obvious real world problems.
Thread Sanitizer on macOS
Asan is mentioned a lot in this thread, but if this is a race condition of some kind then Tsan might be what should really be used:
https://clang.llvm.org/docs/ThreadSanitizer.html
Haven't used Tsan, can't vouch for it, except for the fact that it exists and claims to have macOS support on its web page.
FWIW I removed address sanitiser, and added thread sanitizer. See the flags here. Couldn't get the crash in debug or relwithdebinfo builds.
Tried again and clang (9 and 11) does not crash for me either on linux.
Added subscriber: @nos-2
I am not a developer but wanted to add that it also crashes on my Mac when changing the torus radius.
Nothing on the project. Just a new torus and crashes.
MacBook Pro 2015, Mojave, AMD Radeon R9 M370X.
edit: just tried on 2.83.9 and the same thing happens.
At this point I am wondering why I am bothering with learning a program that has such a basic bug unsolved for at least the last 2 publicly released versions
Perhaps a better quality control should take place?
edit2: "No compatible GPU's found" either for OpenCL or CUDA Perhaps that's the problem?
edit3: After installing all versions from 2.82a to 2.90.1, I can confirm that only version 2.82a works without crashing
Added subscriber: @pixeltrain3d
This comment was removed by @rjg
Added subscriber: @blenderrocket
Added subscribers: @Goodneutrino, @Quantumplation-4, @robbott
Added subscriber: @oweissbarth
Let's say somebody would want to use
git bisect
to figure out where this problem started.git
repo with complete history for the new undo system? Where?This problem already existed when the undo speedup first entered Blender master, I could repro at the below change.
If you want to try this, first enable Interface / Developer Extras, then in the new Experimental tab check Undo Speedup.
Two ways forward that I can come up with:
b852db57ba
with the new knowledge that it introduces this issue. @mont29, since your name is on the change, do you think you could find the problem this way?git bisect
. Maybe look for this in #60695 or D6580, since those are mentioned?Found the
undo-experiments
branch.The latest commit into that branch did not seem to have this problem, or at least I was unable to repro.
@Walles I don't think bisect is going to help really, once again: new undo revealed several issues in existing code, it did not create them.
The proper way to solve this imho is to actually investigate the reasons of the crash (which requires to be able to reproduce it), using debug sessions etc.
Also, since this only happens on a specific OS/compiler version, I would suspect one of those two sources as being the issue here:
-t 1
option and see if they could still reproduce the crash?)....but again, these are wild guesses.
Crash also happens with 1 thread, see #84368 for debug output.
I ran Thread Sanitizer (tsan) on Blender version
bc788929aa2bd259670a5562a1f403f25cad4625
(recentmaster
) on macOS.Changing a torus radius one notch got me 30 warnings:
tsan-change-torus-radius-bc788929.txt
To find some common theme I did this:
... and found 26 of these (out of 30):
To find examples in the attached file, just search it for
add_id_node
.Read Before Write
One way to read a tsan warning is to convert it into a timeline.
Picking the first warning in the tsan output, which is one of the
add_id_node()
ones, a timeline could look like this (see attachment for full backtraces):add_id_node()
allocates a new node by callingNot sure how much this helps, but I do think it would be worth it for somebody familiar with how synchronization is meant to work to have a look at these diagnostics and see if something can be improved.
For reference, possibly related issues are being investigated in #84397.
@Walles Thank you for looking into this. Could you check how this looks like when you start Blender with
--debug-depsgraph-no-threads -t 1
as well?I can't redo this bug with the patch in https://developer.blender.org/T84397#1089432 applied.
But it could be that I didn't move the slider fast enough, so would be nice if someone else also tests it.
With
--debug-depsgraph-no-threads -t 1
I get no thread warnings whatsoever.With just
--debug-depsgraph-no-threads
I get a similar amount of warnings as above, but none aboutadd_id_node()
:debug-depsgraph-no-threads.txt
With just
-t 1
I get no thread warnings whatsoever.It seems like you might have found another issue then, as the problem from #84397 still happens with
--debug-depsgraph-no-threads -t 1
.I tried with that patch and I couldn't repro either after I applied it.
Without the patch I was able to crash Blender by resizing the torus.
Can people able to reproduce the issue confirm if D10077: Fix #84397, #80203: use
session_uuid
instead of ID pointers in depsgraph storage. fix it for them? thanks.Modified version of D10077 has been committed as
abbc43e4e4
, please report if this fixes this issue or not.Changed status from 'Needs Developer To Reproduce' to: 'Resolved'
No more crash for me! Double checked the commit before, and that still crashes reliably.
Glad to see such a nasty one gone.
Can confirm.
abbc43e4e4
also solved #84387 . Thanks.