CUDA Error
Open, IncompletePublic

Description

System Information
Windows 10, NVidia GTX 1060 6GB

Blender Version
Broken: 2.79 release candidate 2

Short description of error
Rendering or even going into rendered view yields "Cuda error: Illegal address in cuCTxSynchronize(), line 1372"

Exact steps for others to reproduce the error
It's as simple as place an object in, add material/add details, go into rendered view at 32 samples or render at 128 samples.

Details

Type
Bug
LazyDodo (LazyDodo) triaged this task as "Incomplete" priority.Mon, Aug 28, 7:29 PM

when you check the windows event log around the time you get this error are there any mentions of Event ID 4101 ( Display Driver Timeout ) ? Also is this branched path tracing by any chance?

when you check the windows event log around the time you get this error are there any mentions of Event ID 4101 ( Display Driver Timeout ) ? Also is this branched path tracing by any chance?

Same GPU, same version of Blender, same error here.
Event ID is 1001, it's named "LiveKernelEvent".
I'm not using branched path tracing.

Please also check those points:

  • Give us your exact Blender, OS and GPU (including drivers) versions, as requested in the template!
  • Ensure both your OS and drivers are fully up-to-date (and use official GPU drivers, not those provided by windows or tablet/laptop maker).
  • Try to start Blender in factory settings (--factory-startup commandline option) (this will ensure whether this is a userpref or addon issue or not).
  • Launch Blender from the command line with --debug-cycles option and attach as text file here any error printed out in the console (do not paste it directly in comment).

Okay, so I waited for the release to test this again.
It's still there. I'll try to find all the points:
Blender version 2.79 (I don't know if there's more to it, it's the full release from today).
Windows 10 Version 1607 (Build 14393.1715)
Nvidia GeForce GTX 1060 6GB (Asus ROG STRIX-GTX1060-O6G-GAMING to be precise) with driver version 385.41
GPU drivers are the official ones. They're up-to date, OS as well.
Starting in factory settings (well, plus activating CUDA in user preferences) still brings about this error.
Attached is the error log from the console (I only included the stuff that said "error").

Bastien Montagne (mont29) raised the priority of this task from "Incomplete" to "Normal".Wed, Sep 13, 8:11 AM

Thanks, let's see what our Cycles experts have to say here.

Sergey Sharybin (sergey) lowered the priority of this task from "Normal" to "Incomplete".Wed, Sep 13, 9:27 AM

Please attach simple .blend file which demonstrates the issue. From the log i can see it's trying to use denoising, which is definitely not enabled by default and is lacking any mention in steps to reproduce the error.

Sorry, my bad. I tested out the denoiser and forgot to turn it off for the error log. The error was 100% reproducable both with denoiser on and off but I tested again today and I couldn't reproduce it a single time. I made no updates to my Windows version, drivers or Blender version so I don't really understand why this is gone. I attached the .blend file anyway.

Update:
I tried to get to the bottom of this since it seemed very strange that this issue suddenly vanished.
Now, since I use this PC mainly for gaming I have routinely overclocked my GPU as far as I could. I have of course tested the stability of this in games so my setup should be stable at all times. When I made the tests earlier this day I hadn't noticed that msi Afterburner had reset my overclocking settings.
Testing out further I found that only the core clock offset seemed to cause this issue, when I raised only the memory clock it rendered just fine.
So there, that's propably where the problem comes from.
I did all my testing with the .blend file from the previous post (denoiser disabled).

I am also getting the CUDA error: Illegal address in cuCtxSynchronize(), line 1372 error on the majority scenes I attempt to render via GPU (e.g. BMW Benchmark, Racing Car, etc.). Unfortunately, it's making GPU rendering unusable for me, as I can't tell from one render to another if it's going to crash, and it requires a complete restart of Blender to be able to attempt rendering again.

My setup:

Blender 2.79 (also happens on 2.78c)
Windows 10 Pro (64-bit) v1703 (build 15063.608)
MSI GTX 1080 TI SEA HAWK X (11 GB) - Driver version 385.41

It seems to happen more often with higher tile sizes and resolutions (still happens even as low as 128x128 tile size and lower, though) in sufficiently complex scenes, such as the demo files mentioned above. It also happens regardless of overclocking, no overclocking, or even underclocking of both core clock and memory clock (so no luck compared to Stefan's latest comment above). Various games and burn tests, and overall system, are completely stable. I have even underclocked the core clock by as much as -400 MHz and memory clock by -500 MHz (lowest I can take them) and it still happens. I could not replicate with the above Clouds.blend file, however, regardless of tile size and resolution.

Here is the complete output debug log for the BMW Benchmark (bmw27_gpu.blend, with no settings changed whatsoever) with "--factory-startup --debug-cycles" (seems very similar to Stefan's):

Event logs under System for nvlddmkm with ID of 13 at time of above error:

\Device\UVMLiteProcess12
Graphics SM Warp Exception on (GPC 1, TPC 0): Out Of Range Address

\Device\UVMLiteProcess12
Graphics Exception: ESR 0x50c648=0x100000e 0x50c650=0x0 0x50c644=0xd3eff2 0x50c64c=0x17f

I have tried with and without TDR on (TdrDelay set higher, and also TdrLevel set to 0) with no difference.

Here are some other event logs I've had for other scenes earlier today:

\Device\UVMLiteProcess4
NVRM: Graphics TEX Exception on (GPC 4, TPC 0): TEX NACK / Page Fault

\Device\UVMLiteProcess4
Variable String too Large

\Device\UVMLiteProcess4
Graphics Exception: ESR 0x524224=0x80000000 0x524228=0x0 0x52422c=0x0 0x524234=0x0

\Device\UVMLiteProcess4
NVRM: Graphics TEX Exception on (GPC 5, TPC 2): TEX NACK / Page Fault

Help would be greatly appreciated as I am at a complete loss with what to do about GPU rendering. Please let me know if I can provide any more info or if I should create a separate issue.

@Stefan Eisenreich (Stef1309), we can not guarantee Blender to work on overclocked GPUs. There is a good reason why vendors didn't use higher frequencies to begin with. Fact that games are stable doesn't really mean much here, CUDA program will stress GPU much more than OpenGL/DirectX.

@Maeldor (Maeldor), did you try rendering default cube on GPU? Did you try installing driver downloaded directly from nvidia.com ?

@Brecht Van Lommel (brecht), i'm away from my main desktop currently. Did you happen to have Pascal+Windows configuration handy? :)

Thanks for your quick reply!

@Maeldor (Maeldor), did you try rendering default cube on GPU? Did you try installing driver downloaded directly from nvidia.com ?

I can typically render simple things without issue (so the default cube scene is likely to never cause a problem), but as I mentioned:

It seems to happen more often with higher tile sizes and resolutions (still happens even as low as 128x128 tile size and lower, though) in sufficiently complex scenes

Such a scene being the BMW Benchmark. I've often been able to successfully render it fully at 720p with a tile size of 128x128, but not always guaranteed. The moment I put the tile size or resolution higher, however, it almost guarantees a crash. Other scenes, like the Racing Car, will crash even at settings that low or lower.

Drivers are installed via Geforce Experience, which are latest (385.41).

@Stefan Eisenreich (Stef1309), we can not guarantee Blender to work on overclocked GPUs. There is a good reason why vendors didn't use higher frequencies to begin with. Fact that games are stable doesn't really mean much here, CUDA program will stress GPU much more than OpenGL/DirectX.

This worries me immensely; as I have mentioned, I've tested even with underclocking, but the issue persists while everything else is perfectly stable. Could this suggest a hardware fault? I was hoping it was a bug rather than a problem with my system. I don't even overclock my GPU whatsoever; everything is stock, and despite this, my attempts with underclocks well below stock levels (-400 MHz) still produce the exact same problem.

Are there any Windows utilities that you know of that would let me specifically test the stability of CUDA? I've ran several general GPU stability testers and benchmarks (FurMark, 3DMark, Heaven, Superposition, etc.), for extended periods and they are all perfectly fine, but if you suggest CUDA is different, it would be good to be able to test this specifically.

Thanks!

@Maeldor (Maeldor), if you don't overclock, then things should be working stable. Root of your issue might be different from @Stefan Eisenreich (Stef1309).

There is one more thing you can try: render file from console (cmd.exe), something like:

path\to\blender.exe --debug-cycles --factory-startup -y -b path\to\bmw27_cpu.blend -f 1

and see if that works and whether log gives any clues.

@Maeldor (Maeldor), if you don't overclock, then things should be working stable. Root of your issue might be different from @Stefan Eisenreich (Stef1309).

There is one more thing you can try: render file from console (cmd.exe), something like:

path\to\blender.exe --debug-cycles --factory-startup -y -b path\to\bmw27_cpu.blend -f 1

and see if that works and whether log gives any clues.

Thanks very much! I'll try that when I get home.

I see you've specified the bmw27_cpu.blend file. Is that meant to be CPU or GPU? And I expect --factory-startup will prevent the GPU being picked as the compute device.

Hi.
I have read before that the mere fact of having installed some GPU overclocking application can bring conflicts with Blender, even for CPU render. I say this because I read above about MSI Afterburner. So just in case to do reliable tests, if you have any of them installed (MSI Afterburner, ASUS GPU Tweak, etc.), completely remove the application.

@Maeldor (Maeldor), yes, should be bmw27_gpu. And you're right about --factory-startup. Let's do this way then: backup your settings, load factory startup, enable CUDA and then try rendering from terminal.

Testing without overlocking/tweaking software will indeed be really handy. I was never able to reproduce any render problems on Windows and NVidia's GTX 1080..

Hi, I have this too.

Specs : Windows 10 Pro x64, Blender 2.79 x64 Official Release, GPU for rendering : GTX 1080 Ti 11 Gb (no overclocking or anything at all) with 385.28 drivers version.

For now, it just does me on a scene with 5 large trees (about 8.8 millions tris). It start the rendered mod but stop at 2/32 samples. Once it failed, I cannot relauch a render, I have to close Blender. I also tried to delete 4 of the 5 trees before lauching the rendered mode but it failed too (even if I save the file with only one tree and reopen Blender).
If I append the tree in a new scene, it works. If I append the 5 trees, the rendered mode stops at 11/32 samples.

I don't know if it can help but I made screeshots of console log :


hi, I can confirm this error too.
win 10 64 bit
gtx 1080 Overclocked at ~2100MHz when rendering with overclocked VRAM
blender 2.79

(all rendering referred to as being rendered with cycles and gpu)

but when rendering the bmw benchmark even on 1440p with 300 samples and 256x256 tiles (without denoise) I have no crash whatsoever

I reproduce the crash when rendering these two scenes, even with the samples at 50, resolution at 50%:
from https://www.blender.org/download/demo-files/

production benchmark:
https://www.blender.org/download/demo-files/

splash of 2.79 (agent 327), here with brached and normal path tracing; with/without denoise
https://cloud.blender.org/p/gallery/59819ee681191741ad07d283

too complex scenes do not render, even at low settings.
the crash occurs after some tiles rendered.
after the first crash, all rendering crashes immediately at the first tile. restarting blender allowes to render different files again.

hope this helps---