Page MenuHome

Complex scene, one GPU renders successfully, but multiple GPUs fail to render
Closed, ResolvedPublic

Description

System Information
Operating system: Windows 10 64bit
Graphics card: NVIDIA GeForce 980 Ti + NVIDIA GeForce RTX 2080 SUPER

Blender Version
Broken: v2.82 Alpha. Previous versions also do not seem to have worked.
Worked: after patching D6126

Short description of error
A very complex scene, which one GPU can render successfully with the help of enough AGP memory, cannot be properly rendered by multiple GPUs. The render may fail from the start or may produce a mosaic result of normal tiles and buggy tiles.

Exact steps for others to reproduce the error

  1. Open the 'Cosmos Laundromat Demo' ( https://download.blender.org/demo/test/benchmark.zip ) ( I know it is a CPU render demo. We need just a very complex scene draining GPU memory. )
  2. Select GPU compute device under Render properties.
  3. Render image.

I submitted the patch for this bug to D6126.

Event Timeline

Ha Hyung-jin (robobeg) updated the task description. (Show Details)

Unfortunately this does not fix the Optix render if one of the gpu's is driving the screen.
I get with 2 exact same RTX 2080 and the spring test scene:

OptiX error OPTIX_ERROR_LAUNCH_FAILURE in optixLaunch(pipelines[PIP_PATH_TRACE], cuda_stream[thread_index], launch_params_ptr, launch_params.data_elements, &sbt_params, wtile.w * wtile.num_samples, wtile.h, 1), line 668

and then only the not screen driving device renders the scene. Cuda seems fine now.

Jens

ㅡ>>! In T71071#806412, @jens verwiebe (jensverwiebe) wrote:

Unfortunately this does not fix the Optix render if one of the gpu's is driving the screen.
I get with 2 exact same RTX 2080 and the spring test scene:

OptiX error OPTIX_ERROR_LAUNCH_FAILURE in optixLaunch(pipelines[PIP_PATH_TRACE], cuda_stream[thread_index], launch_params_ptr, launch_params.data_elements, &sbt_params, wtile.w * wtile.num_samples, wtile.h, 1), line 668

and then only the not screen driving device renders the scene. Cuda seems fine now.
Jens

Does the error still occur when only one gpu driving the scene is used for Optix render? Then, it may be a different issue. The memory management errors usually occurred at cuMemcpyHtoD, cuMemsetD8, and etc.

If the error still occurs with one gpu, I may debug it since I have one RTX 2080 SUPER.

Hi robobeg
Yes. Using the the "spring" scene as test again, the same error occures with only the screen driving gpu should render.
So the issue seems not to be the balancing between differently VRAM equipped gpu's, your fix looks okay here.
I now follow your assumption that its another issue.
I can get the scene to render by either reducing the haircount or other simplifications that reduce memory usage.
I think now that the memory used for the attched screens is not taken into account. ( ? )

You might end with the same result as me, is it okay you make a new bugreport out of this find yourself ?
You are the specialist here.

Greetings ... Jens

Hi robobeg
Yes. Using the the "spring" scene as test again, the same error occures with only the screen driving gpu should render.
So the issue seems not to be the balancing between differently VRAM equipped gpu's, your fix looks okay here.
I now follow your assumption that its another issue.
I can get the scene to render by either reducing the haircount or other simplifications that reduce memory usage.
I think now that the memory used for the attched screens is not taken into account. ( ? )
You might end with the same result as me, is it okay you make a new bugreport out of this find yourself ?
You are the specialist here.
Greetings ... Jens

Actually I am not used to Optix and I am not even a member of the blender foundation. I had to modify the Optix device code because most of it was 'copy&paste'd from the CUDA device code and shares the same memory management module. To test the render, I have to install the Optix SDK first and build the blender for it. As far as I know, supporting Optix is an experimental feature at the moment.

Why don't you report the bug for yourself? Then an Optix expert here may fix it faster than me. I will try to fix it if the bug report is not answered for a while.