CUDA error: Launch failed in cuCtxSynchronize()
Closed, ArchivedPublic

Description

System Information
Windows 10 Home
CPU: Intel core i7-5500u CPU @ 2.40GHz 8GB RAM
GPU: NVIDIA GeForce 940M (GM108) 2048 MB DDR3
GPU-z Snapshot: http://gpuz.techpowerup.com/16/08/09/8s4.png

Blender Version
Broken: 2.77 abf6f08

Short description of error
CUDA error: Launch failed in cuCtxSynchronize()


Exact steps for others to reproduce the error
Open BMW27.blend
Set Render >> Device >> GPU Compute
Click Render

Details

Type
Bug
Sergey Sharybin (sergey) triaged this task as Incomplete priority.Aug 11 2016, 12:19 PM

Several questions:

  • Does this still happen with latest builds from builder.blender.org?
  • Does it depend on the tile size you use (try using much smaller tile size)?
  • Does it work in previous versions of Blender?
  • Attaching full console log as a file would give extra clues here.
Maximilian (MaxW) added a comment.EditedAug 12 2016, 10:21 AM

I get that error too, but with a different scene

System Information
Windows 10 Pro
CPU: Intel Xeon E5-1620 v3 @ 3,50Ghz
RAM: 32GB
GPU: 2x Nvidia GTX 1060 6144MB GDDR5
GPU-Z Snapshot: https://postimg.org/image/7hoolmc1n/

Blender Version
2.77fdc43f9 11 August

Short description of error
CUDA error: Launch failed in cuCtxSynchronize()
(Scene is to big to upload, 1,3GB)
Error pops up after loading all textures, building BVH and Blender begins to render

Answers

  • Does this still happen with latest builds from builder.blender.org?

~ yes

  • Does it depend on the tile size you use (try using much smaller tile size)?

~ Bigger or smaller tile sizes both don't work

  • Does it work in previous versions of Blender?

~ first daily build tested with gpu rendering and bigger scenes was 2.771f19fba (latest stable build couldn't handle to use RAM when VRAM was full) - that build is also not working with that scene

  • Attaching full console log as a file would give extra clues here.
[Build 2.771f19fba]

CUDA error: Launch exceeded timeout in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch exceeded timeout in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch exceeded timeout in cuMemAlloc(&device_pointer, size)
CUDA error: Launch exceeded timeout in cuMemAlloc(&device_pointer, size)
**Same CUDA error: Launch exceeded timeout .... for the next 400 or more rows!**
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
**Same error like 20 times more**
CUDA error: Launch exceeded timeout in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch exceeded timeout in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)

[Build 2.77fdc43f9]

CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
**Same CUDA error: Launch failed.... for the next 400 or more rows!**
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
**Same error like 20 times more**
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()

Seems to be the same reason why it crashed, but with a different output in the newer version.

We tested the latest stable build on a GTX 750 with 2 GB VRAM, we put enough textures in it to let it crash. Then we tested the build 2.771f19fba with the same scene and it worked => out of VRAM wasn't a problem anymore

Then we used a scene with a lot more textures and additionally used all RAM - like expected blender crashed

Now we have 6GB VRAM and additionally 32GB of RAM, depending on how it's working this should be 38 GB or 32 GB of space for rendering. I might be wrong but i think this should be enough.

Edit: Here is a picture from the load on VRAM and RAM right before the error comes up -
https://postimg.org/image/3kdfuy66t/

Hi, Ram and Vram are not added, if you use GPU you are limited to 6 GB - system usage, 3-400 MB.
The timeout error is another thing, search for "nvidia timeout"

Mib

Maximilian (MaxW) added a comment.EditedAug 12 2016, 11:03 AM

Hi, but i found this:

developer.blender.org/D2056

This patch will allow CUDA devices to use system memory in addition to VRAM. While this is obviously is slower than VRAM, I think it is still better than not to render at all

Edit: i put a 3. Card in this PC, Blender now shows me all 3 cards as CUDA devices separately and as a 4th option i can only choose all 3 cards for compute. To workaround the timeout i only can compute on one of my cards and not both GTX 1060

Is there a way to use both 1060 for compute and ignore the 3. small card where the display is connected ? I think about trying it with Linux.

Is there a way to use both 1060 for compute and ignore the 3. small card where the display is connected ? I think about trying it with Linux.

No.
This patch is not accepted for master atm..

These are not bugs, please make a thread in https://blenderartists.org/forum/forum.php > Technical Support
May I can help.
This tracker is for bugs only, not support. :)

Mib

Does this still happen with latest builds from builder.blender.org?
Yes. Hash: 294eac2
Does it depend on the tile size you use (try using much smaller tile size)?
Yes. The tile size was set to 128x128 i'he reduce to 16x16 and now its rendering. (I feel like an idiot)
Does it work in previous versions of Blender?
I dont know
Attaching full console log as a file would give extra clues here.

C:\Program Files\Blender Foundation\Blender>blender.exe
Read new prefs: C:\Users\jtorres\AppData\Roaming\Blender Foundation\Blender\2.77\config\userpref.blend
found bundled python: C:\Program Files\Blender Foundation\Blender\2.77\python
[Lux 2016-Aug-17 17:26:35] Attempting to import pylux module from "C:/Program Files/LuxRender"
[Lux 2016-Aug-17 17:26:35] Failed to import pylux module from "C:/Program Files/LuxRender"
[Lux 2016-Aug-17 17:26:35] Attempting to import pylux module from "C:\Users\jtorres\AppData\Roaming\Blender Foundation\Blender\2.77\scripts\addons\luxrender"
[Lux 2016-Aug-17 17:26:35] Pylux module imported successfully
[Lux 2016-Aug-17 17:26:35] Using pylux version 1.6.0  Build 16132
[Lux 2016-Aug-17 17:26:35] Attempting to import pyluxcore module from "C:/Program Files/LuxRender"
[Lux 2016-Aug-17 17:26:35] Failed to import pyluxcore module from "C:/Program Files/LuxRender"
[Lux 2016-Aug-17 17:26:35] Attempting to import pyluxcore module from "C:\Users\jtorres\AppData\Roaming\Blender Foundation\Blender\2.77\scripts\addons\luxrender"
[Lux 2016-Aug-17 17:26:35] Pyluxcore module imported successfully
[Lux 2016-Aug-17 17:26:35] Using pyluxcore version 1.6
[Lux 2016-Aug-17 17:26:35] Installed scene post-update handler
read blend: C:\temp\blender\BMW27.blend\BMW27.blend
skipping driver '90*brake', automatic scripts are disabled
skipping driver '-90*brake', automatic scripts are disabled
skipping driver '100*power', automatic scripts are disabled
skipping driver '-100*power', automatic scripts are disabled
skipping driver '100*power', automatic scripts are disabled
skipping driver '90*brake', automatic scripts are disabled
skipping driver '-90*brake', automatic scripts are disabled
skipping driver '-100*power', automatic scripts are disabled
CUDA error: Launch failed in cuCtxSynchronize()

Refer to the Cycles GPU rendering documentation for possible solutions:
http://www.blender.org/manual/render/cycles/gpu_rendering.html

CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()

skipping driver '100*power', automatic scripts are disabled
skipping driver '-90*brake', automatic scripts are disabled
skipping driver '90*brake', automatic scripts are disabled
skipping driver '-100*power', automatic scripts are disabled
CUDA error at cuCtxCreate: Launch failed

Refer to the Cycles GPU rendering documentation for possible solutions:
http://www.blender.org/manual/render/cycles/gpu_rendering.html

CUDA error: Invalid value in cuCtxDestroy(cuContext)
Error: CUDA error at cuCtxCreate: Launch failed

I'm seeing similar errors on a Titan X Pascal. It's reproducible on the latest BuildBot builds with the "Victor" benchmark file located here: https://code.blender.org/2016/02/new-cycles-benchmark/
The error varies depending on tile size, I've included both below. GPU-Z indicates the card has VRAM to spare during the error. Windows TDRDelay timeout value set to a generous 120 secs.
Please advise if I should create a separate bug report.

Duplicate lines omitted -
CUDA error: Launch failed in cuCtxSynchronize()
CUDA error: Launch failed in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)(mem.device_pointer + offset), size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Launch failed in cuMemAlloc(&device_pointer, size)
CUDA error: Launch failed in cuMemFree(cuda_device_ptr(mem.device_pointer))
Error: CUDA error: Launch failed in cuCtxSynchronize()
CUDA error at cuCtxCreate: Launch failed
Also -
CUDA error: Invalid value in cuCtxDestroy(cuContext)
Error: CUDA error at cuCtxCreate: Launch failed

Blender version: blender-2.77.0-git.d99c513-Win64
OS: Windows 10 64 bit
GPU: Titan Z Pascal driver version 372.54

Sergey Sharybin (sergey) closed this task as Archived.Aug 25 2016, 9:56 AM
Sergey Sharybin (sergey) claimed this task.

The log indicates this is just an out of VRAM error, which we can't easily resolve (we are constantly working on lower memory usage tho).

Thing here is that Cycles itself will require over 1GB of VRAM just to render the scene, then you've got Blender interface opened and OS itself will use some of the VRAM. Combined all this together and then remembering of memory fragmentation can very well hit upper limit of the VRAM.

So thanks for the report, but nothing we can easily fix, more like work on general memory usage improvements..

I tested again with today's BuildBot build and Blender is now crashing during render instead of throwing CUDA errors.

The rendering computer here has 64GB system RAM and 12GB of video RAM on the Titan X Pascal card.

Blender only utilizes 18.4GB system RAM (29%) and 9.1GB video RAM (74%) while rendering the "Victor" Cycles benchmark.
Monitoring tools like GPU-Z and Task Manager indicate Blender is not hardware contained when it fails.

Are there any assumptions in Blender code about the amount video RAM available by Blender?

There are no assumptions are made, it just allocates memory for until all required memory is allocated or out of memory error occurs.

What's the video driver version you're using?

Driver Version 372.54 - WHQL
Driver Release Date Mon Aug 15, 2016

This is then most likely the same error as reported with another 370 series driver in there T49113. As you mention, the driver was released prior the bug report was fired to NVidia so it can't have discovered bug fixed.

For the time being you can roll back to 367 which is known to be stable.

I can attempt a driver roll back and test again. T49113 describes a Linux driver that causes crash on F12. The behavior here is on Windows 10 and crashes part way through the render.
Let me know if there are instructions for generating more detailed Blender crash details.

Ok, so first of all please open a new bug report since it is something up to do with crash, not with the cuCtxSynchronize. And then:

  • Check if crash happens with latest builds from builder.blender.org.
  • Check if crash happens with previous release.
  • Attach logs of blender run with --debug-cycles command line argument.

Do all that as an additional information in the new report :)

I'm still getting this problem, does anyone have a working fix yet?

I'm still getting this problem, does anyone have a working fix yet?

I've opened this ticket. Please take a look on what I've answered to developers.

Does it depend on the tile size you use (try using much smaller tile size)?
Yes. The tile size was set to 128x128 i'he reduce to 16x16 and now its rendering.

Change your tile size to 8x8 and verify if the error persists.

Hi,
I was having this problem this week.
I knew my scene was not using all my graphics memory (GTX970) but blender kept failing wih cuCtxSynchronize (I think it was that and not initialize) or sometimes blender was crashing altogether.

I tried stripping my shader back until I realised it seemd to be something to do with my texture (1600*1600) texture so it got me thinking was it something to do with disk access (as it didn't crash when I replaced my image textures with noise ones).

I then recalled something I had read years ago to do with TdrDelay. I increased my TdrDelay setting in regedit. Now my render works.

I am not saying this is your issue - but it might be. I am not sure of ythe wisdom of changing TdrDelay - but it worked for me.

I have fixed the CUDA error: Launch failed in cuCtxSynchronize(), the Launch error, Blender crashing when using viewport rendering, and many other things by doing one thing: Disabling Branched Path Tracing and using Path Tracing instead.

Not sure if other people will benefit from this, but I lost 64 hours of work trying to fix this. Hope this helps.

tried your solution Faber Miles and it didn't worked. Now it's failing when i hit the start button. Pushed the Tilesize to 512x512, so it was a bit obvious.