Page MenuHome

Crash on entering Edit Mode of high poly mesh used by Shrink Wrap
Closed, ResolvedPublic

Description

System Information/Blender Version:

Win8.1x64, 3x gtx580:

  • 2.79t2 crashes,
  • 2.78c works

Ubuntu 14 x64, IntelGMA:

  • 2.79t2 works,
  • 2.78c works

Short description of error
Entering edit mode on particular relatively high poly mesh, used by Shrink Wrap modifiers, crashes Blender 2.79 test build 2. If Shrink Wraps using the mesh are disabled - no crash. No crash on a much weaker virtually GPU-less Ubuntu machine or both machines in 2.78c.

Running from cmd:

Switching to fully guarded memory allocator.
Blender 2.78 (sub 5)
Build: Fri 07/14/2017 12:01 AM Windows
argv[0] = C:\blender\blender-2.79-testbuild2-windows64\blender-2.79-testbuild2-windows64\blender.exe
argv[1] = -d
read file C:\Users\kivig\AppData\Roaming\Blender Foundation\Blender\2.78\config\userpref.blend
  Version 278 sub 5 date 2017-06-09 14:30 hash e0bc5b5
Read prefs: C:\Users\kivig\AppData\Roaming\Blender Foundation\Blender\2.78\config\userpref.blend
read file
  Version 272 sub 2 date unknown hash unknown

ordered
 OBCube
 OBLamp
 OBCamera
found bundled python: C:\blender\blender-2.79-testbuild2-windows64\blender-2.79-testbuild2-windows64\2.78\python
Read blend: Y:\***\err3b.blend
read file Y:\***\err3b.blend
  Version 278 sub 5 date 2017-07-13 10:50 hash 248bba81e7a

ordered
 OBBezierCurve.005
 OBBezierCircle
 OBBezierCurve
 OBPlane.005
 OBPlane.004
ED_undo_push: Toggle Editmode
Error: EXCEPTION_ACCESS_VIOLATION

Exact steps for others to reproduce the error

  • Open
  • Enter edit Mode

Event Timeline

I tried with latest daily build and didn't crash. Win 7 + GTX 750.

ordered
 OBBezierCurve.005
 OBBezierCircle
 OBBezierCurve
 OBPlane.005
 OBPlane.004
ED_undo_push: Toggle Editmode
ED_undo_push: Toggle Editmode
ED_undo_push: Toggle Editmode
ED_undo_push: Translate

Daily build (e 982ebd) crashes for me.

OK. I can confirm this. Apparently, the crash might be random for me. I was testing writing debug log to disk, and changing to Edit Mode cause Blender to close with the error you mentioned.

C:\Blender\blender-2.78-e982ebd-win64\blender-2.78.0-git.e982ebd-windows64>blender -d > test2.txt
Error: EXCEPTION_ACCESS_VIOLATION

C:\Blender\blender-2.78-e982ebd-win64\blender-2.78.0-git.e982ebd-windows64>

Note to developer:

Try testing this multiple time if the first time didn't cause a crash. Maybe the first time changing the mode didn't cause a crash, but it might do something to memory. But subsequent opening the file and changing the mode will cause blender to crash with EXCEPTION_ACCESS_VIOLATION.

Also of note, Blender crash that involve such error doesn't flush error or anything to the text file, even if cmd box is OK. test2.txt remains empty.

C:\Blender\blender-2.78-e982ebd-win64\blender-2.78.0-git.e982ebd-windows64>blender -d > test2.txt
Error: EXCEPTION_ACCESS_VIOLATION

C:\Blender\blender-2.78-e982ebd-win64\blender-2.78.0-git.e982ebd-windows64>blender -d > test2.txt
Error: EXCEPTION_ACCESS_VIOLATION

C:\Blender\blender-2.78-e982ebd-win64\blender-2.78.0-git.e982ebd-windows64>

OK. This bug is really strange. If I run blender -d without piping to text file, I get the original result as the first time testing this bug (no crash).

Another note, now that I try with piping to text again, no crash.

Conclusion: I can confirm this bug, but for me apparently its quite random...

Bastien Montagne (mont29) triaged this task as Needs Information from User priority.

EXCEPTION_ACCESS_VIOLATION means Blender tries to access invalid memory - typically already freed one.

Could be a threading issue e.g., can you reproduce it with -t 1 option? Also, @Fable Fox (fablefox), if you have a debug build of Blender, you should be able to use MSVC debugger to precisely locate the bug?

@Bastien Montagne (mont29) here's a stackdump

>	blender.exe!bvhtree_from_mesh_looptri_create_tree(float epsilon=0.000000000, int tree_type=0x00000002, int axis=0x00000006, const MVert * vert=0x0000000090522f38, const MLoop * mloop=0x0000000091ff0ff8, const MLoopTri * looptri=0x0000000094ff0ff8, const int looptri_num=0x00087500, const unsigned int * looptri_mask=0x0000000000000000, int looptri_num_active=0x00087500) Line 1028	C
 	blender.exe!bvhtree_from_mesh_looptri(BVHTreeFromMesh * data=0x00000000229cec90, DerivedMesh * dm=0x000000008fd6c838, float epsilon=0.000000000, int tree_type=0x00000002, int axis=0x00000006) Line 1175	C
 	blender.exe!shrinkwrap_calc_nearest_surface_point(ShrinkwrapCalcData * calc=0x00000000229cee00) Line 565	C
 	blender.exe!shrinkwrapModifier_deform(ShrinkwrapModifierData * smd=0x00000000192c2f28, Object * ob=0x0000000019240a58, DerivedMesh * dm=0x0000000000000000, float[3] * vertexCos=0x000000008fbfc488, int numVerts=0x000000f4, bool for_render=false) Line 662	C
 	blender.exe!deformVerts(ModifierData * md=0x00000000192c2f28, Object * ob=0x0000000019240a58, DerivedMesh * derivedData=0x0000000000000000, float[3] * vertexCos=0x000000008fbfc488, int numVerts=0x000000f4, ModifierApplyFlag flag=MOD_APPLY_USECACHE) Line 125	C
 	blender.exe!modwrap_deformVerts(ModifierData * md=0x00000000192c2f28, Object * ob=0x0000000019240a58, DerivedMesh * dm=0x0000000000000000, float[3] * vertexCos=0x000000008fbfc488, int numVerts=0x000000f4, ModifierApplyFlag flag=MOD_APPLY_USECACHE) Line 776	C
 	blender.exe!mesh_calc_modifiers(Scene * scene=0x00000000191c8a98, Object * ob=0x0000000019240a58, float[3] * inputVertexCos=0x0000000000000000, const bool useRenderParams=false, int useDeform=0x00000001, const bool need_mapping=false, unsigned __int64 dataMask=0x0000000026000009, const int index=0xffffffff, const bool useCache=true, const bool build_shapekey_layers=false, const bool allow_gpu=true, DerivedMesh * * r_deform=0x0000000019240f68, DerivedMesh * * r_final=0x0000000019240f70) Line 1845	C
 	blender.exe!mesh_build_data(Scene * scene=0x00000000191c8a98, Object * ob=0x0000000019240a58, unsigned __int64 dataMask=0x0000000026000009, const bool build_shapekey_layers=false, const bool need_mapping=false) Line 2637	C
 	blender.exe!makeDerivedMesh(Scene * scene=0x00000000191c8a98, Object * ob=0x0000000019240a58, BMEditMesh * em=0x0000000000000000, unsigned __int64 dataMask=0x0000000026000009, const bool build_shapekey_layers=false) Line 2731	C
 	blender.exe!BKE_object_handle_data_update(EvaluationContext * eval_ctx=0x000000001b0e7fe8, Scene * scene=0x00000000191c8a98, Object * ob=0x0000000019240a58) Line 192	C
 	blender.exe!BKE_object_handle_update_ex(EvaluationContext * eval_ctx=0x000000001b0e7fe8, Scene * scene=0x00000000191c8a98, Object * ob=0x0000000019240a58, RigidBodyWorld * rbw=0x0000000000000000, const bool do_proxy_update=false) Line 2650	C
 	blender.exe!scene_update_object_func(TaskPool * pool=0x00000000311a2768, void * taskdata=0x0000000018b9cf78, int threadid=0x00000004) Line 1527	C
 	blender.exe!task_scheduler_thread_run(void * thread_p=0x000000001a9abf38) Line 441	C
 	pthreadVC2.dll!000007feeab1627b()	Unknown
 	pthreadVC2.dll!000007feeab18eb7()	Unknown
 	pthreadVC2.dll!000007feeab19102()	Unknown
 	[External Code]

looptri is pointing to invalid / freed memory.... don't know the code well enough to actually fix the bug here.

Germano Cavalcante (mano-wii) raised the priority of this task from Needs Information from User to Confirmed, Medium.

In my case, the looptri 553472 is pointing to the loops {4261281277, 0, 0} :\

I made these changes in the code and the problem was solved!:

diff --git a/source/blender/blenkernel/intern/subsurf_ccg.c b/source/blender/blenkernel/intern/subsurf_ccg.c
index c4665c40ec4..c227d30194a 100644
--- a/source/blender/blenkernel/intern/subsurf_ccg.c
+++ b/source/blender/blenkernel/intern/subsurf_ccg.c
@@ -4476,7 +4476,6 @@ static void ccgDM_recalcTessellation(DerivedMesh *UNUSED(dm))
 
 static void ccgDM_recalcLoopTri(DerivedMesh *dm)
 {
-	BLI_rw_mutex_lock(&loops_cache_rwlock, THREAD_LOCK_WRITE);
 	MLoopTri *mlooptri;
 	const int tottri = dm->numPolyData * 2;
 	int i, poly_index;
@@ -4502,7 +4501,6 @@ static void ccgDM_recalcLoopTri(DerivedMesh *dm)
 		lt->tri[2] = (poly_index * 4) + 2;
 		lt->poly = poly_index;
 	}
-	BLI_rw_mutex_unlock(&loops_cache_rwlock);
 }
 
 static const MLoopTri *ccgDM_getLoopTriArray(DerivedMesh *dm)
@@ -4511,7 +4509,11 @@ static const MLoopTri *ccgDM_getLoopTriArray(DerivedMesh *dm)
 		BLI_assert(poly_to_tri_count(dm->numPolyData, dm->numLoopData) == dm->looptris.num);
 	}
 	else {
-		dm->recalcLoopTri(dm);
+		BLI_rw_mutex_lock(&loops_cache_rwlock, THREAD_LOCK_WRITE);
+		if (!dm->looptris.array) {
+			dm->recalcLoopTri(dm);
+		}
+		BLI_rw_mutex_unlock(&loops_cache_rwlock);
 	}
 
 	return dm->looptris.array;

I don't know why this change fixed the bug, but I've seen something that goes against the main goal of multithread, the performance.

In the file, two objects have the Shrinkwrap modifier with the high-poly object as target, so that two threads execute the code that searches the looptris of the high-poly object.
The problem is that the two threads execute the same ccgDM_recalcLoopTri function so one waits at the beginning of the function, to repeat it as soon as the other ends (BLI_rw_mutex_lock).

This repetition is a total waste of processing. So it should be avoided.

Thanks for the investigations so far. :)

The problem is thread concurrency, as usual:

  • thread 1 (T1) calls getLoopTriArray(), which finds lopptri array is NULL and needs to be computed.
  • T2 does the same thing, and also finds array is NULL and needs to be recomputed.
  • Meanwhile, T1 has called recalcLoopTri() and locked the RWmutex.
  • When T2 calls it, it has to wait for the mutex to be unlocked, before… overriding the whole looptri array just computed by T1.
  • But!!! T1 has already returned its just-computed looptri array, which is now being freed by T2, so code in T1 trying to use returned array will use invalid memory.

Note that this is just an example, order could be totally different here, point is, there is no real protection against concurrency here.


Your solution works, but only in that specific case - if someone calls recaclLoopTri without passing by getLoopTriArray(), you'll again get concurrency issues.

I think proper solution here would be to not expose recalcLoopTri at all in DM API, and instead feature an invalidateLoopTri callback tagging the looptri array as needing recomputation. That way, both get and invalidate functions could do proper lock and safetycheck inside locked area, to ensure they do not tip over other threads' toes. recalcLoopTri would then just be internal static helper. Note that you also have to handle CDDM and EMDM!

@Campbell Barton (campbellbarton), @Sergey Sharybin (sergey), thoughts?