Page MenuHome

NVIDIA issue prevents full indirect draw call batching performance
Open, Confirmed, MediumPublic

Description

System Information
Operating system: Linux-5.2.0-2-amd64-x86_64-with-debian-bullseye-sid 64 Bits
Graphics card: Quadro RTX 6000/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 430.40

Blender Version
Broken: version: 2.81 (sub 11), branch: master, commit date: 2019-09-18 08:49, hash: rBe2cbf8b1174d
Worked: (optional)

Short description of error
Going to edit mode makes links in node editor to disappear.

Exact steps for others to reproduce the error
Open attached file, go to object's edit mode.

Seems something to do with /* Add link to batch. */ code path. Forcing /* Draw single link. */ code math makes all links to properly display.

Details

Type
Bug

Event Timeline

Jacques Lucke (JacquesLucke) lowered the priority of this task from Needs Triage by Developer to Confirmed, Medium.Sep 18 2019, 12:16 PM
Brecht Van Lommel (brecht) raised the priority of this task from Confirmed, Medium to Unbreak Now!.Sep 18 2019, 2:53 PM

This is only a display issue, but highly confusing. It should be fixed asap.

I cannot reproduce on any of my setups (even linux + nv430.40). I'm trying to figure out what's happening.

Ok both Linux + GTX 960M (driver 430.40) and Win 10 x64 + GTX 960 with latest drivers are working correctly. So I'm guessing it's an issue with the RTX card driver.

The issue is caused by glMultiDrawElementsIndirect & glMultiDrawArraysIndirect which leads to apparent issue in rendering instancing drawcalls (glDrawElementsInstanced, glDrawArraysInstanced and the like)

Cleaning up the mapping does not fix it. (unmapping before changing context or other draw)
Removing the MultiDrawIndirect calls fixes it.

I'll report the issue to NVidia.

EDIT: Something I forgot to mention: No error is generated by the GL either by the debug facility (--debug-gpu) or by manually querying glGetError().

So I'm guessing it's an issue with the RTX card driver.

Jacques is using GTX card as far as i know though.

Happens here on Linux GTX 1660 with Nvidia driver

Happens here on Win10 RTX2070 driver 431.53

@Clément Foucault (fclem), please commit whatever workaround is needed to fix this issue, even if it means disabling some of the batching optimizations for NVIDIA. We can't leave an issue with this priority broken for so long in master.

This comment was removed by Christoph (ChrisGraz).

Select NVIDIA-Color Settings and 10 bit Color instead of the Standard Color Settings in the NVIDIA Control Panel to fix this in Windows 10.
Graphics card: GeForce RTX 2070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 431.86
Blender Version
Broken: version: 2.81 (sub 12), branch: master, commit date: 2019-09-27 22:44, hash: rB1c1a3198af9d

I'm using 10 bit colour with my RTX 2080 Ti and the bug is still there.

The committed workaround breaks meshes/objects. Trying to move the default cube in object mode just moves the origin and not the mesh.
Opening a file saved pre-commit moves all meshes (but not object origins) and empties to the world center.
Sorry if this isn't the place to report it and I'll open a new bug report if it's just on my end.
(Linux / 2070)

Brecht Van Lommel (brecht) renamed this task from Shader node link disappears when going to edit mode to Various display issues on NVIDIA after draw call batching.Sep 29 2019, 9:14 PM

So we went for an aggressive workaround since it could also affect bones. The workaround was to disable indirect drawing for all Nvidia hardware. This has some performance issue but some of the batching performance gain are still in.

About the bug itself, note that only the next instanced drawcall (that is after some buggy indirect drawcall), is not working.
The indirect drawcall that seems to be responsible for the buggy behavior seems to be any drawcall inside the edit mesh pass (bypassing edit_mesh_draw_components fixes it). But for the other problems (other missing instanced drawcall) the culprit indirect drawcall is another.

We tried to replicate the bug in a small test program but couldn't. The issue must come from a special state setup or a special sequence of events.

Is it possible to capture the render state with something like Renderdoc? Might give you a few hints.
If you haven't tried it already, it might be an issue with alignment of the data in the indirect buffer (wild guess though).

@Antony Riakiotakis (psy-fi) We already tried renderdoc but there was nothing standing out.

Indirect buffer is aligned correctly as all indirect calls are rendered correctly on every platform that support it. We also tried uploading the indirect buffer using glBufferData or glBufferSubData but that changed nothing. So it's not buffer management.

Brecht Van Lommel (brecht) lowered the priority of this task from Unbreak Now! to Confirmed, Medium.Oct 3 2019, 2:25 PM

Lowering priority since there is no known breakage here, just a potential performance improvement by getting indirect calls to work on NVIDIA.

Brecht Van Lommel (brecht) renamed this task from Various display issues on NVIDIA after draw call batching to NVIDIA issue prevents full indirect draw call batching performance.Oct 3 2019, 2:26 PM

For 2.81 we can just keep this disabled and live with somewhat reduced performance, but still better than 2.80.

So NVidia replied and said it was indeed a known driver bug (affecting Turing GPUs) that will be fixed in the next iteration of their drivers (version 440) to be released early next month.

So we should whitelist this driver version before release preferably.

On windows they already release 440 and 441. On Linux there is a 440 beta driver.

Probably this is not important enough to risk for the 2.81 release, seems simpler to just postpone it to 2.82.

@Brecht Van Lommel (brecht) @Clément Foucault (fclem) Hey, that is not a good idea. At least maybe test it? What version of driver fixed that, you know. Also maybe try to check what connected issues were indeed because of those indirect calls. Maybe some of them used them wrong, then it is a problem, etc, etc. Also are you aware of a new Studio driver with optimized V-ray and Blender [1]? Did you already check it?

Also it has some fixes [2] in Optix, in particular:
[OptiX]: OptiX unable to select GPUs for use when all GPUs in the system are in TCC mode
[2722686]
[OptiX]: Applications built against OptiX SDK 7.0.0 throw error message when using
OptixBuildInputTriangleArray::preTransform. [2742029]
[OptiX denoiser]: Errors occur when using the denoiser and specifying Color + Albedo +
Normals.
[OptiX denoiser]: The Alpha-denoising path does not support blend mode. [2727732]
[OptiX denoiser]: OptiX errors occur when specifying Normals + Albedo for input. [2712572]
[OptiX]: noinline function calls do not use custom ABI. [2631716]
[OptiX]: denoiser device handling preempts user control. [2647842]
[OptiX denoiser]: OptiX denoiser produces small color values for black pixels and zero alpha.
[2545368]

Will be very happy if you polished Blender for Nvidia driver as Nvidia polished driver for you ;)


[1] https://www.nvidia.com/en-us/geforce/forums/studio-drivers/39/329442/nvidia-studio-driver-44128-feedback-thread-release/
[2] https://us.download.nvidia.com/Windows/441.28/441.28-win10-nsd-release-notes.pdf