Threading issue in sculpt mode #76858
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#76858
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Linux-5.6.13-arch1-1-x86_64-with-arch 64 Bits
Graphics card: AMD Radeon RX 5700 (NAVI10, DRM 3.36.0, 5.6.13-arch1-1, LLVM 10.0.0) X.Org 4.6 (Core Profile) Mesa 20.0.7
Blender Version
Broken: version: 2.83 (sub 17) and an older version from end of march (
817c38f7
).Worked: (newest version of Blender that worked as expected)
Short description of error
When starting to sculpt, ASAN detects a heap-use-after-free error.
Exact steps for others to reproduce the error
In P1400 I did the following:
BKE_pbvh_parallel_range2
that executes a given callback in parallel, but with the c++ threads library instead of TBB. The result is the same.pbvh_update_draw_buffer_cb
for testing.The issue is fixed by uncommenting the mutex in
pbvh_update_draw_buffer_cb
such that it is only executed by a single thread at once. I tried reducing the scope of the mutex, but still got crashes, it just took a bit longer.So far, the issue only happened with dynamic topology enabled, it can also be related to that. I still could not narrow the issue down further.
I originally investigated this in #76544, but decided to make this a more specific separate report.
Added subscriber: @JacquesLucke
Changed status from 'Needs Triage' to: 'Needs Developer To Reproduce'
Added subscribers: @PabloDobarro, @brecht
@brecht, @PabloDobarro Can you check if you can reproduce this please. I can reproduce it reliably.
In release builds I did not get a crash from this yet. Maybe the memory leaks described in #76544 are a result of the threading issue.
I'm setting this to high priority, since this could be quite a bad bug that we should really get rid of before releasing 2.83.
This issue was referenced by
e0ae229acb
I'm unable to reproduce the crash, with a debug build using asan. I can reproduce the
member call on address 0x613000adefc0 which does not point to an object of type 'task'
problem.From looking at the backtrace, I guessed there is a GPU batch referencing a freed GPU vertex buffer. I found a case where that could happen and committed a fix.
But I don't think it's causing this specific issue, and looking at the backtrace more closely I'm not sure that's actually what is going on.
Some random things to try:
update_only_visible
tofalse
inBKE_pbvh_draw_cb
always, and see if that stops the crash.PBVH_RebuildDrawBuffers
inBKE_pbvh_node_mark_update
,BKE_pbvh_node_mark_update_mask
orBKE_pbvh_node_mark_redraw
, to force a full rebuild of the draw buffers instead of a partial. Maybe partial rebuilding has some issue.GPU_pbvh_bmesh_buffers_update_free
could be changed to free more buffers with flat shading like it does for smooth shading.If I could repro this I could probably figure out what is going on. I'm not sure why I can't or how to make it work.
That TBB warning seems to be a known issue in TBB.
https://github.com/oneapi-src/oneTBB/issues/140
It's unclear if this is a bug in TBB or a false positive from address sanitizer, but I guess it's not related to this bug.
I'll try to reproduce it on another OS/computer tomorrow.
Added subscriber: @mont29
Trying this morning, I cannot get an ASAN report, but I can 'reliably' reproduce a crash (including in debugger) with following backtrace (which looks similar to the ASAN report from @JacquesLucke):
To do so, I have to generate some 'broken' geometry, typically (with dyntopo enabled):
Operating system: Linux-5.6.13-arch1-1-x86_64-with-arch-Arch-Linux 64 Bits
Graphics card: Mesa DRI Intel(R) HD Graphics 4600 (HSW GT2) Intel Open Source Technology Center 4.5 (Core Profile) Mesa 20.0.7
I just freshly installed EndeavourOS on my laptop (so another device from my original report). I can reproduce the same issues on that device as well.
Unfortunately, none of that helped.
Here is a quick video showing the issue, just to make sure we are on the same page. Note that I get a different ASAN output here, because I did not deactivate threading for a couple of functions. Here are two other ASAN reports I get in a Blender 2.83 build: P1405.
2020-05-19 13-45-00.mp4
When calling
make
, I get a couple of cmake warnings (P1406). I have no idea if those are related to the problem. It does not look like it does.I did not get the error on windows yet.
This issue was referenced by
8d63d7337c
This issue was referenced by
59cfb20fa1
This issue was referenced by
499c0229f7
Changed status from 'Needs Developer To Reproduce' to: 'Resolved'
This turned out to be a harmless assert and a problem only when using
--debug-memory
.Thanks Brecht! That solved the issue indeed, and it looks like a couple of issues were solved in the process as well, so it was not all for nothing.
I was actually stepping through the allocator to find the issue, because I could not find it anywere else. There I saw
mem_lock_thread
andmem_unlock_thread
and then wrongly assumed that it was thread safe... Wouldn't be better to make sure that our memory allocator is always thread safe? I find the idea of having a non-thread-safe main allocator a bit uncomfortable. This might become more of an issue if we decide to use tbb in more places directly, instead of using our own abstractions.I agree, I committed
183ba284f2
after this fix.