Page MenuHome

Cycles render fail on large scene with Titan X Pascal
Closed, ResolvedPublic

Description

New bug report requested from Sergey via T49059

Windows 10 64 Bit, 64GB of system RAM
NVIDIA GeForce GTX Titan Pascal, 12.2GB of video RAM, Driver 372.54 (WHQL)
Blender BuildBot: blender-2.77.0-git.50a44ed-windows64 (2016-08-26)

Blender fails on Cycles render of “Victor” benchmark file resulting in CUDA error and occasional Blender application crash. Some tiles are rendered successfully before failure, multiple tile sizes tested.
During render fail, Blender and system only utilizes 26% system RAM (16.7GB of 63.9GB) and only 72% of video RAM (8.8GB of 12.2GB). Blender does not appear hardware constrained during issue.

Reproduction steps:

  1. Download “cycles_benchmark_20160228.zip” from: https://code.blender.org/2016/02/new-cycles-benchmark/
  2. Unzip and open “victor_cpu.blend”
  3. In Render Properties, change Device to “GPU Compute”
  4. In Render Properties, change Feature Set “Experimental” (Although issue experienced with “Supported” setting also)
  5. In Render Properties, change Tiles to 480x270
  6. Click the Render button

Attached files:
Blender–debug-cycles console log - Shows Cycles fail with CUDA errors, including "Launch failed in cuCtxSynchronize()"
GPU-Z Sensors log - Shows max video card RAM utilized during Cycles fail is 8886MB
Screen shots of before render, during render, after render – Task Manager shows 64GB total system RAM with 16.7GB utilized, GPU-Z shows 12,288MB total video memory with 8823MB utilized during render

Event Timeline

Sergey Sharybin (sergey) lowered the priority of this task from 90 to 30.Aug 29 2016, 10:08 AM

Does the crash happen with smaller tile size and/or simplified render settings?

Points are:

  • There is some degree of memory fragmentation happening in VRAM, meaning actual physical memory usage will be larger than requested one.
  • It seems GPU-Z actually report requested memory usage, without fragmented memory taken into account.
  • Your GPU is using 300 meg before rendering already
  • Rendering requires 8.5Gig of device memory (requested, fragmentation we can't predict)
  • Kernel itself will require 1.3Gig of memory with such settings, which is allocated just after the scene is all in memory to my knowledge, so that would explain why you've got render failure after everything is in VRAM.

If it's just fragmented memory which makes it impossible to render we can't do much from our side, unfortunately.

Testing has always been done after a fresh reboot. I'm not sure how that impacts VRAM fragmentation. Let me know if that would rule it out as the culprit.

I've tested with multiple render tile sizes, and either receive a CUDA error or Blender crash with each setting. I've just retested using 256 x 215 tiles and the Blender application crashes part way through the render leaving the video card at full throttle. The last message in the system console is this (can provide full console log if needed):

I0902 10:00:55.278880 10160 scene.cpp:248] System memory statistics after full device sync:

Usage: 9,076,432,535 (8.45G)
Peak: 9,177,015,479 (8.55G)

I monitored video card memory utilization using the EVGA PrecisionX utility during that test and the largest amount reported utilized was 9,026 MB out of 12,288 MB . Let me know if you recommend a different utility to monitor video card memory utilization.

Regarding your question about simplifying the benchmark file, please let me know if there is a simple option you'd recommend. I can enable the Scene > Simplify option and retest if it's helpful. Let me know.

After a reboot, I ran another render test at 256 x 215 tile size and Blender produces the following error (duplicate lines removed):

I0904 16:37:00.042017 7984 scene.cpp:248] System memory statistics after full device sync:

Usage: 9,076,231,823 (8.45G)
Peak: 9,176,814,767 (8.55G)

AL lib: (EE) ALCmmdevPlayback_mixerProc: Failed to get padding: 0x88890004
CUDA error: Launch failed in cuCtxSynchronize()

Refer to the Cycles GPU rendering documentation for possible solutions:
http://www.blender.org/manual/render/cycles/gpu_rendering.html

CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
I0904 18:02:13.656242 956 blender_session.cpp:550] Total render time: 5208.46
I0904 18:02:13.656242 956 blender_session.cpp:551] Render time (without synchronization): 5161.91
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()

Did you try rendering with Simplify enabled?

Just re-ran the 256 x 215 tile test with Simplified enabled and received the following errors (duplicate/info lines removed):

I0906 20:59:00.478320 5064 scene.cpp:248] System memory statistics after full device sync:

Usage: 9,076,231,791 (8.45G)
Peak: 9,176,814,735 (8.55G)

AL lib: (EE) ALCmmdevPlayback_mixerProc: Failed to get padding: 0x88890004
CUDA error: Launch failed in cuCtxSynchronize()
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
I0906 21:26:18.371513 10044 blender_session.cpp:550] Total render time: 1732.9
I0906 21:26:18.371513 10044 blender_session.cpp:551] Render time (without synchronization): 1686.31
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()

Let me know if you've ruled out memory fragmentation, as rebooting before each test is time consuming.

The latest BuildBot builds will utilize over 11GB of VRAM when rendering using the new Cycles displacement feature. So, Blender appears capable of accessing most of them VRAM on the card with a different scene.

Did you change any of the simplification settings? Like, setting subdiv level to lower values, or particle child to a lower value? With the default settings just enabling Scene Simplify will result in the same scene (since subdivisions are set to 6 and child particles to 1).

Some more things to try:

  • Try disabling Hair BVH option in the performance panel. There was some error discovered there today.
  • This might be similar timeout problem as with [1], so try increasing TDR or disabling it [2]
  • Try with the recent buildbot (windows builds are in progress there now, wait for them to finish and grab the build).

[1] https://www.blender.org/manual/render/cycles/gpu_rendering.html#the-nvidia-opengl-driver-lost-connection-with-the-display-driver
[2] https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx

Thanks for the troubleshooting tips. I've retested with blender-2.78.0-git.1558f5b-windows64 (2016-09-09)

Disabled Hair BVH and received the following error:
CUDA error: Launch failed in cuCtxSynchronize()
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)

Selected Simplify and reduced subsurf to 3 and particles to .5 and received the following error:
CUDA error: Launch failed in cuCtxSynchronize()
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
I0909 22:31:43.240526 8392 blender_session.cpp:550] Total render time: 33349.4
I0909 22:31:43.240526 8392 blender_session.cpp:551] Render time (without synchronization): 33315.3
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()

TdrDelay REG_DWORD has been set to 120 seconds from the start. No NVIDIA/Windows TDR driver timeout errors experienced during any of the renders.

Besides available VRAM, what else could be causing these error messages?

I'm pleased to report that the test file is now rendering correctly on the latest BuildBot Blender builds. Please close this bug report.

Successful render was with the following software (No other changes):
BuildBot 2.78 2016-09-27 - blender-2.78.0-git.7f76f6f-windows64 and NIVIDIA driver 3.72.90

The card peaked at only 9381MB VRAM used during render (VRAM wasn't the issue)

An older BuildBot 2.78 2016-09-11 - blender-2.78.0-git.96f28f9-windows64 still fails consistently with the error above so the NVIDIA driver doesn't appear to have been the culprit, it was a recent change in Blender that fixed it. Props to you the other developers for whomever fixed the issue. We can say with more confidence that Blender 2.78 is Titan X Pascal compatible now that it renders all the Blender Cycles Benchmark files.

The render completed in 42:33.37 with stock GPU clock settings and air cooling

Thanks!

Sergey Sharybin (sergey) changed the task status from Unknown Status to Resolved.Sep 29 2016, 11:37 AM

Don't see what exactly might have fix the error, but there were some fixes in BVH which might have caused the bug.

Anyway, considering as resolved for as long as it works for you :)