Page MenuHome

BLI_assert fails in extract_tris_finish on Duplicate Linked
Closed, DuplicatePublic

Description

System Information
Operating system: Windows-10-10.0.14393-SP0 64 Bits
Graphics card: GeForce GTX 1060 6GB/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 442.19

Blender Version
Broken: version: 2.90.0 Alpha, branch: master (modified), commit date: 2020-06-29 16:20, hash: rB5d31ef082057
Worked: 2.83

Short description of error
BLI_assert(mr->cache->surface_per_mat[0]->elem == ibo) in extract_tris_finish (draw_cache_extract_mesh.c) fails when performing Duplicate Linked operation (Alt+D). If asserts are disabled, Blender may crash later (or sometimes instantly) when working with the duplicates (not sure if related). Example stack trace of such crash (may vary, this one was later, when switching to lookdev iirc):

 	00007ffb278f66c4()	Unknown
 	00007ffb275c4b08()	Unknown
 	00007ffb2744dae5()	Unknown
 	00007ffb27450906()	Unknown
 	00007ffb27809b5e()	Unknown
 	00007ffb2744ea8e()	Unknown
 	00007ffb2744ed19()	Unknown
 	00007ffb278be520()	Unknown
 	00007ffb278bf285()	Unknown
 	00007ffb27454910()	Unknown
>	GPU_batch_draw_advanced(batch=0x0000026dfd1965b8, v_first, v_count=36, i_first=1, i_count=1) Line 757	C
 	GPU_draw_list_submit(list) Line 974	C
 	draw_call_batching_flush(shgroup=0x0000026d92eb4130, state=0x00000083235ff220) Line 977	C
 	[Inline Frame] draw_call_batching_finish() Line 1078	C
 	draw_shgroup(shgroup=0x0000026d92eb4130, pass_state=-2147483599) Line 1246	C
 	drw_draw_pass_ex(pass=0x0000026d92e1f668, start_group=0x0000026d92eb4130, end_group=0x0000026d92eb4130) Line 1305	C
 	DRW_draw_pass(pass=0x0000026d92e1f668) Line 1344	C
 	eevee_draw_scene(vedata=0x0000026d89a1a408) Line 284	C
 	drw_engines_draw_scene() Line 1055	C
 	DRW_draw_render_loop_ex(depsgraph=0x0000026dfe2215e8, engine_type=0x00007ff742445630, region=0x0000026dfd6c4a48, v3d=0x0000026dfd6c6348, viewport=0x0000026d85277fe8, evil_C=0x0000026dfaaa9bd8) Line 1523	C
 	DRW_draw_view(C=0x0000026dfaaa9bd8) Line 1405	C
 	[Inline Frame] view3d_draw_view() Line 1608	C
 	view3d_main_region_draw(C=0x0000026dfaaa9bd8, region=0x0000026dfd6c4a48) Line 1634	C
 	ED_region_do_draw(C=0x0000026dfaaa9bd8, region=0x0000026dfd6c4a48) Line 543	C
 	wm_draw_window_offscreen(C=0x0000026dfaaa9bd8, win=0x0000026dfd178d58, stereo) Line 713	C
 	wm_draw_window(C=0x0000026dfaaa9bd8, win=0x0000026dfd178d58) Line 841	C
 	wm_draw_update(C=0x0000026dfaaa9bd8) Line 1042	C
 	WM_main(C=0x0000026dfaaa9bd8) Line 482	C
 	main(argc=1, UNUSED_argv_c=0x0000000000000000) Line 534	C
 	[External Code]

Exact steps for others to reproduce the error

  1. Open default scene.
  2. With default cube selected press Alt+D. Expect crash.

Output of the failed assert:

Event Timeline

I cannot replicate on Windows 10 RTX2080TI

Vincent Blankfield (vvv) changed the task status from Needs Triage to Needs Information from User.Tue, Jun 30, 3:12 PM

This was tested with: my build of yesterdays master (release with asserts enabled); yesterdays build bot build. Assert failure was reliably reproducible with my build. Crashes - unreliable with both configurations. The only thing that was modified in my build - undefined NDEBUG to enable asserts. I'm not aware of a better way to get asserts in the release build. I will make a more "official" debug build and test with that, as well as search for other clues. Will get back later with the results after all the rebuilds (it will take ages on my cpu).

Vincent Blankfield (vvv) changed the task status from Needs Information from User to Needs Triage.EditedTue, Jun 30, 7:30 PM

Updated and reset git and svn. Deleted user preferences directory. Rebuilt both Debug and Release in a clean directory. Only cmake changes: enabled CUDA and OptiX.

First try (all clean)
Debug: no crash, no assert.
Release: no crash.
Conclusion: Unreproducible in the stock Debug and Release builds (crash may or may not be there, but it's hard to reproduce in any case).

With modification

diff --git a/source/blender/draw/intern/draw_cache_extract_mesh.c b/source/blender/draw/intern/draw_cache_extract_mesh.c
index f1a7ab8c9d8..e2061c34b60 100644
--- a/source/blender/draw/intern/draw_cache_extract_mesh.c
+++ b/source/blender/draw/intern/draw_cache_extract_mesh.c
@@ -674,6 +674,10 @@ static void extract_tris_finish(const MeshRenderData *mr, void *ibo, void *_data
   GPU_indexbuf_build_in_place(&data->elb, ibo);
   /* HACK: Create ibo sub-ranges and assign them to each #GPUBatch. */
   if (mr->use_final_mesh && mr->cache->surface_per_mat && mr->cache->surface_per_mat[0]) {
+    if (mr->cache->surface_per_mat[0]->elem != ibo) {
+      printf("mr->cache->surface_per_mat[0]->elem: %x; ibo: %x\n", mr->cache->surface_per_mat[0]->elem, ibo);
+      printf("mr->cache->surface_per_mat[0]->elem->data: %x\n", mr->cache->surface_per_mat[0]->elem->data);
+    }
     BLI_assert(mr->cache->surface_per_mat[0]->elem == ibo);
     for (int i = 0; i < mr->mat_len; i++) {
       /* Multiply by 3 because these are triangle indices. */

Debug: no crash, no assert, no output.
Release: no crash, but got reliable output on each Alt+D:

Read prefs: C:\Users\vvv\AppData\Roaming\Blender Foundation\Blender\2.90\config\userpref.blend
found bundled python: C:\Users\vvv\Documents\HOME\blender\blender-source\build\bin\Release\2.90\python
mr->cache->surface_per_mat[0]->elem: c29d47e8; ibo: c29d4068
mr->cache->surface_per_mat[0]->elem->data: c29d4068
mr->cache->surface_per_mat[0]->elem: c29d7668; ibo: c29d7b28
mr->cache->surface_per_mat[0]->elem->data: c29d7b28
mr->cache->surface_per_mat[0]->elem: afb27b98; ibo: c300d888
mr->cache->surface_per_mat[0]->elem->data: c300d888

Conclusion: The condition only reliably happens in the release builds, where asserts are disabled.

Side notes
Release without NDEBUG is highly recommended for testing this and other possibly related issues. I've stumbled onto this issue when poking around on something like T78054. While at it, I've noticed a lot of unreproducible issues related to the different draw_cache_extract.c functions. I've got a strong feeling that it's related to the multithreded executions of those functions. The actual crashes happen all over the place. Sometimes it's corrupted data (especially frequently noticed with custom data, but that's maybe because I was looking mostly into it), sometimes attempts do double free, sometimes just pure randomness. It would be cool to be able to isolate all the data that may be accessed cross-thread in some getters/setters and lock it with mutexes, but the scope of it is huge, and it's not feasible with current data structures. Anyway, all this may not even be related to the current issue.

We are already aware of this problem.
It has already been reported in T77867