CUDA Illegal address errors Windows
Open, ConfirmedPublic

Description

System Information
Windows 10, NVidia GTX 1060 6GB

Blender Version
Broken: 2.79 release candidate 2

Short description of error
Rendering or even going into rendered view yields "Cuda error: Illegal address in cuCTxSynchronize(), line 1372"

Exact steps for others to reproduce the error
It's as simple as place an object in, add material/add details, go into rendered view at 32 samples or render at 128 samples.

Details

Type
Bug
LazyDodo (LazyDodo) triaged this task as Incomplete priority.Aug 28 2017, 7:29 PM

when you check the windows event log around the time you get this error are there any mentions of Event ID 4101 ( Display Driver Timeout ) ? Also is this branched path tracing by any chance?

when you check the windows event log around the time you get this error are there any mentions of Event ID 4101 ( Display Driver Timeout ) ? Also is this branched path tracing by any chance?

Same GPU, same version of Blender, same error here.
Event ID is 1001, it's named "LiveKernelEvent".
I'm not using branched path tracing.

Please also check those points:

  • Give us your exact Blender, OS and GPU (including drivers) versions, as requested in the template!
  • Ensure both your OS and drivers are fully up-to-date (and use official GPU drivers, not those provided by windows or tablet/laptop maker).
  • Try to start Blender in factory settings (--factory-startup commandline option) (this will ensure whether this is a userpref or addon issue or not).
  • Launch Blender from the command line with --debug-cycles option and attach as text file here any error printed out in the console (do not paste it directly in comment).

Okay, so I waited for the release to test this again.
It's still there. I'll try to find all the points:
Blender version 2.79 (I don't know if there's more to it, it's the full release from today).
Windows 10 Version 1607 (Build 14393.1715)
Nvidia GeForce GTX 1060 6GB (Asus ROG STRIX-GTX1060-O6G-GAMING to be precise) with driver version 385.41
GPU drivers are the official ones. They're up-to date, OS as well.
Starting in factory settings (well, plus activating CUDA in user preferences) still brings about this error.
Attached is the error log from the console (I only included the stuff that said "error").

Bastien Montagne (mont29) raised the priority of this task from Incomplete to Normal.Sep 13 2017, 8:11 AM

Thanks, let's see what our Cycles experts have to say here.

Sergey Sharybin (sergey) lowered the priority of this task from Normal to Incomplete.Sep 13 2017, 9:27 AM

Please attach simple .blend file which demonstrates the issue. From the log i can see it's trying to use denoising, which is definitely not enabled by default and is lacking any mention in steps to reproduce the error.

Sorry, my bad. I tested out the denoiser and forgot to turn it off for the error log. The error was 100% reproducable both with denoiser on and off but I tested again today and I couldn't reproduce it a single time. I made no updates to my Windows version, drivers or Blender version so I don't really understand why this is gone. I attached the .blend file anyway.

Update:
I tried to get to the bottom of this since it seemed very strange that this issue suddenly vanished.
Now, since I use this PC mainly for gaming I have routinely overclocked my GPU as far as I could. I have of course tested the stability of this in games so my setup should be stable at all times. When I made the tests earlier this day I hadn't noticed that msi Afterburner had reset my overclocking settings.
Testing out further I found that only the core clock offset seemed to cause this issue, when I raised only the memory clock it rendered just fine.
So there, that's propably where the problem comes from.
I did all my testing with the .blend file from the previous post (denoiser disabled).

I am also getting the CUDA error: Illegal address in cuCtxSynchronize(), line 1372 error on the majority scenes I attempt to render via GPU (e.g. BMW Benchmark, Racing Car, etc.). Unfortunately, it's making GPU rendering unusable for me, as I can't tell from one render to another if it's going to crash, and it requires a complete restart of Blender to be able to attempt rendering again.

My setup:

Blender 2.79 (also happens on 2.78c)
Windows 10 Pro (64-bit) v1703 (build 15063.608)
MSI GTX 1080 TI SEA HAWK X (11 GB) - Driver version 385.41

It seems to happen more often with higher tile sizes and resolutions (still happens even as low as 128x128 tile size and lower, though) in sufficiently complex scenes, such as the demo files mentioned above. It also happens regardless of overclocking, no overclocking, or even underclocking of both core clock and memory clock (so no luck compared to Stefan's latest comment above). Various games and burn tests, and overall system, are completely stable. I have even underclocked the core clock by as much as -400 MHz and memory clock by -500 MHz (lowest I can take them) and it still happens. I could not replicate with the above Clouds.blend file, however, regardless of tile size and resolution.

Here is the complete output debug log for the BMW Benchmark (bmw27_gpu.blend, with no settings changed whatsoever) with "--factory-startup --debug-cycles" (seems very similar to Stefan's):

Event logs under System for nvlddmkm with ID of 13 at time of above error:

\Device\UVMLiteProcess12
Graphics SM Warp Exception on (GPC 1, TPC 0): Out Of Range Address

\Device\UVMLiteProcess12
Graphics Exception: ESR 0x50c648=0x100000e 0x50c650=0x0 0x50c644=0xd3eff2 0x50c64c=0x17f

I have tried with and without TDR on (TdrDelay set higher, and also TdrLevel set to 0) with no difference.

Here are some other event logs I've had for other scenes earlier today:

\Device\UVMLiteProcess4
NVRM: Graphics TEX Exception on (GPC 4, TPC 0): TEX NACK / Page Fault

\Device\UVMLiteProcess4
Variable String too Large

\Device\UVMLiteProcess4
Graphics Exception: ESR 0x524224=0x80000000 0x524228=0x0 0x52422c=0x0 0x524234=0x0

\Device\UVMLiteProcess4
NVRM: Graphics TEX Exception on (GPC 5, TPC 2): TEX NACK / Page Fault

Help would be greatly appreciated as I am at a complete loss with what to do about GPU rendering. Please let me know if I can provide any more info or if I should create a separate issue.

@Stefan Eisenreich (Stef1309), we can not guarantee Blender to work on overclocked GPUs. There is a good reason why vendors didn't use higher frequencies to begin with. Fact that games are stable doesn't really mean much here, CUDA program will stress GPU much more than OpenGL/DirectX.

@Maeldor (Maeldor), did you try rendering default cube on GPU? Did you try installing driver downloaded directly from nvidia.com ?

@Brecht Van Lommel (brecht), i'm away from my main desktop currently. Did you happen to have Pascal+Windows configuration handy? :)

Thanks for your quick reply!

@Maeldor (Maeldor), did you try rendering default cube on GPU? Did you try installing driver downloaded directly from nvidia.com ?

I can typically render simple things without issue (so the default cube scene is likely to never cause a problem), but as I mentioned:

It seems to happen more often with higher tile sizes and resolutions (still happens even as low as 128x128 tile size and lower, though) in sufficiently complex scenes

Such a scene being the BMW Benchmark. I've often been able to successfully render it fully at 720p with a tile size of 128x128, but not always guaranteed. The moment I put the tile size or resolution higher, however, it almost guarantees a crash. Other scenes, like the Racing Car, will crash even at settings that low or lower.

Drivers are installed via Geforce Experience, which are latest (385.41).

@Stefan Eisenreich (Stef1309), we can not guarantee Blender to work on overclocked GPUs. There is a good reason why vendors didn't use higher frequencies to begin with. Fact that games are stable doesn't really mean much here, CUDA program will stress GPU much more than OpenGL/DirectX.

This worries me immensely; as I have mentioned, I've tested even with underclocking, but the issue persists while everything else is perfectly stable. Could this suggest a hardware fault? I was hoping it was a bug rather than a problem with my system. I don't even overclock my GPU whatsoever; everything is stock, and despite this, my attempts with underclocks well below stock levels (-400 MHz) still produce the exact same problem.

Are there any Windows utilities that you know of that would let me specifically test the stability of CUDA? I've ran several general GPU stability testers and benchmarks (FurMark, 3DMark, Heaven, Superposition, etc.), for extended periods and they are all perfectly fine, but if you suggest CUDA is different, it would be good to be able to test this specifically.

Thanks!

@Maeldor (Maeldor), if you don't overclock, then things should be working stable. Root of your issue might be different from @Stefan Eisenreich (Stef1309).

There is one more thing you can try: render file from console (cmd.exe), something like:

path\to\blender.exe --debug-cycles --factory-startup -y -b path\to\bmw27_cpu.blend -f 1

and see if that works and whether log gives any clues.

@Maeldor (Maeldor), if you don't overclock, then things should be working stable. Root of your issue might be different from @Stefan Eisenreich (Stef1309).

There is one more thing you can try: render file from console (cmd.exe), something like:

path\to\blender.exe --debug-cycles --factory-startup -y -b path\to\bmw27_cpu.blend -f 1

and see if that works and whether log gives any clues.

Thanks very much! I'll try that when I get home.

I see you've specified the bmw27_cpu.blend file. Is that meant to be CPU or GPU? And I expect --factory-startup will prevent the GPU being picked as the compute device.

Hi.
I have read before that the mere fact of having installed some GPU overclocking application can bring conflicts with Blender, even for CPU render. I say this because I read above about MSI Afterburner. So just in case to do reliable tests, if you have any of them installed (MSI Afterburner, ASUS GPU Tweak, etc.), completely remove the application.

@Maeldor (Maeldor), yes, should be bmw27_gpu. And you're right about --factory-startup. Let's do this way then: backup your settings, load factory startup, enable CUDA and then try rendering from terminal.

Testing without overlocking/tweaking software will indeed be really handy. I was never able to reproduce any render problems on Windows and NVidia's GTX 1080..

Hi, I have this too.

Specs : Windows 10 Pro x64, Blender 2.79 x64 Official Release, GPU for rendering : GTX 1080 Ti 11 Gb (no overclocking or anything at all) with 385.28 drivers version.

For now, it just does me on a scene with 5 large trees (about 8.8 millions tris). It start the rendered mod but stop at 2/32 samples. Once it failed, I cannot relauch a render, I have to close Blender. I also tried to delete 4 of the 5 trees before lauching the rendered mode but it failed too (even if I save the file with only one tree and reopen Blender).
If I append the tree in a new scene, it works. If I append the 5 trees, the rendered mode stops at 11/32 samples.

I don't know if it can help but I made screeshots of console log :


hi, I can confirm this error too.
win 10 64 bit
gtx 1080 Overclocked at ~2100MHz when rendering with overclocked VRAM
blender 2.79

(all rendering referred to as being rendered with cycles and gpu)

but when rendering the bmw benchmark even on 1440p with 300 samples and 256x256 tiles (without denoise) I have no crash whatsoever

I reproduce the crash when rendering these two scenes, even with the samples at 50, resolution at 50%:
from https://www.blender.org/download/demo-files/

production benchmark:
https://www.blender.org/download/demo-files/

splash of 2.79 (agent 327), here with brached and normal path tracing; with/without denoise
https://cloud.blender.org/p/gallery/59819ee681191741ad07d283

too complex scenes do not render, even at low settings.
the crash occurs after some tiles rendered.
after the first crash, all rendering crashes immediately at the first tile. restarting blender allowes to render different files again.

hope this helps---

UPDATE:
Yesterday I installed the newest Nvidia Driver: 385.69
this seemed to fix the crash.
agent 327 renders fine on gpu even with larger tile sizes (though 256x256 lead to blender crashing completely -> other topic)

UPDATE:
Yesterday I installed the newest Nvidia Driver: 385.69
this seemed to fix the crash.
agent 327 renders fine on gpu even with larger tile sizes (though 256x256 lead to blender crashing completely -> other topic)

Was hopeful at reading this, but unfortunately it has made absolutely no difference for me. I even used DDU to reinstall from scratch to be safe, but I'm still having the exact same problem, so I guess you're lucky if you're not having the problem anymore!

@Maeldor (Maeldor), if you don't overclock, then things should be working stable. Root of your issue might be different from @Stefan Eisenreich (Stef1309).

There is one more thing you can try: render file from console (cmd.exe), something like:

path\to\blender.exe --debug-cycles --factory-startup -y -b path\to\bmw27_cpu.blend -f 1

and see if that works and whether log gives any clues.

Ended up giving this many tries but got the exact same error(s) as before from the previous log. Do you still want an output?

I have also been doing tests all week to try and replicate the issue outside of Blender and have had no luck. I've tried various benchmarks for CUDA that I could find for Windows (Arion Benchmark, Octane Benchmark, NiceHashMiner, etc.) and they all had absolutely no problems. They all stressed the GPU to its limits via CUDA (particularly the hash miner) and none of them had a single error or crash. If you can suggest any other CUDA applications I can try, I'll be happy to also check those (any video transcoders?).

I also did a completely clean reinstall of Windows onto a separate hard drive to be able to start from scratch without any software installed. I installed Blender 2.79, did nothing but set the GPU as the compute device and the tiles to 512x512. I got the error after just a few seconds on the first tile, so it seems it's not related to my installed software.

This problem definitely seems to be specific to Blender only. Any ideas where to go from here?

Thanks!

For me, it still even worse with 385.69 NVidia drivers. Blender crashes instead of giving the error and a file that worked with previous version of drivers crashes now :(.

Well, I got a clue : in my scene, I use Principled BSDF with Subsurface at 1.000. By turning down the Subsurface at 0, it works.

To add precision :

  • there are 5 trees with for each one one trunk material and 3 leaves material (so 5 x 3 = 15 leaves materials). Leaves material use Subsurface at 1.000. In rendered mode, I raise one by one the subsurface from 0 to 1 and when I raise the 7th, it crashes.
  • but in another scene with only one tree (so 3 leaves material), only one subsurface at 1 makes Blender crashes. I noticed that the zoom had an importance : if I'm far, it doesn't crash even with 3 subsurfaces at 1.

Hope it helps.

Update:
after some more intensive testing with the other file (production benchmark) I have to withdraw my initial fix report.
the production benchmark is not renderable on the gpu, even with 16x16 tiles, giving the initial error again.
But if the tiles get too big, the error changes to the "not enough memory" error.
however i was able to render the agent327 sucessfully with 128x128 tiles (at 100% of original res) which was impossible before.
sorry

Brecht Van Lommel (brecht) raised the priority of this task from Incomplete to Confirmed.EditedSep 29 2017, 11:47 PM

I could reproduce some rather random crashes with the official 2.79 release, a GTX 1080 and driver version 385.69 on Windows 10, for example in classroom_gpu.blend. Most of the time it works, but sometimes it crashes almost immediately.

Rendering the "Production Benchmark" (Victor scene) never works here, due to running out of memory. But it didn't work in earlier Blender versions for me either, 8 GB might just not be enough memory to render this scene.

I haven't been able to repro any crash with the latest daily build (97eefc1) so far, but that might be a coincidence since this crash is quite random for me. Can anyone else confirm if there is still a crash in the latest builds?
https://builder.blender.org/download/

I could reproduce some rather random crashes with the official 2.79 release, a GTX 1080 and driver version 385.69 on Windows 10, for example in classroom_gpu.blend. Most of the time it works, but sometimes it crashes almost immediately.

Rendering the "Production Benchmark" (Victor scene) never works here, due to running out of memory. But it didn't work in earlier Blender versions for me either, 8 GB might just not be enough memory to render this scene.

I haven't been able to repro any crash with the latest daily build (97eefc1) so far, but that might be a coincidence since this crash is quite random for me. Can anyone else confirm if there is still a crash in the latest builds?
https://builder.blender.org/download/

Thanks for confirming this, Brecht. I was getting worried it wouldn't be reproduced.

Unfortunately, I have had no luck with the latest build v2.79 e3546a5 (I couldn't find a download link for 97eefc1 specifically). It crashed consistently on the first tile (though typically at a different samples number) in classroom_gpu.blend with my tests:

I also thought I'd try v2.8 (e3fe812) just to compare, but no luck there either. With classroom_gpu.blend it crashes without even needing to click render because it automatically renders in the viewport:

We've now had a couple of similar reports, but currently none of the developers can reproduce this issue. I've tried different scenes, different driver versions, different Windows 10 versions, reinstalling Windows, etc. It all seems stable now, for some reason. It's unclear why there is a problem for some users, while most users seem to have no problem (as evidenced by threads like these).

There's a few possible causes:

  • Bug in Cycles that causes it to access invalid memory, either in the kernel itself or in preparing data for the kernel. So far I have found nothing inspecting the code and running tools like valgrind and cuda-memcheck.
  • Different Nvidia graphics settings. Checking the "clean install" option when installing Nvidia drivers may help.
  • Corrupted Nvidia graphics installation. We've seen a few cases where Blender and any other application using CUDA crashes due to this, but not this type of error.
  • Differences in Windows versions or hardware. So far no pattern is visible here.

For now, all I can suggest is to do a totally clean driver install:

  • Uninstall any NVidia, AMD or Intel graphics drivers
  • Download latest driver from nvidia.com
  • Install, with clean install option checked
  • Reboot computer

Perhaps upgrading to Windows 10 Creators Update (or Fall Creators Update) may indirectly help as well.

I can confirm that the Fall Creators Update (with clean install of latest drivers, 387.92) has unfortunately not helped the issue, nor any of the recommendations by Brecht in the previous post.

One more thing to try would be to disabling NVidia Game Overlay in Geforce Experience (if it's enabled), or uninstall Geforce Experience (while keeping the driver).

Has anyone tested disabling the Game Overlay? We're working with NVidia to resolve an issue with that, but we don't know yet if it's related to this specific crash.

We've also been making quite a few changes to the CUDA implementation in the last month, so latest daily builds may also help:
https://builder.blender.org/download/

I've disabled that and still get errors. In fact I've bought a new SSD and installed Windows 10 fresh and with nothing else installed except Blender and the latest nVidia drivers I still get errors.

on my machine the problem just disappeared without any (known:)) changes from my side.
I did thousands of renderings with cycles on the gpu and everything worked fine now..
Very strange, keep my fingers crossed.

Have tried some thorough testing of the benchmarks with the Game Overlay disabled, but it hasn't helped. Using drivers 388.13.

Latest tests over the past couple of days have been with blender-2.79.0-git.8a72be7-windows64 (2017-11-06) and initial results seemed promising. I was able to render bmw27_gpu and pavillon_barcelone_gpu yesterday, and fishy_cat_gpu today, but then failed about just over half way through classroom_gpu straight after :( Unfortunately such inconsistency has been typical the past few weeks. I've been able to successfully render all the benchmarks at one point or another, but not consistently, which rules out actual development and production use.

Error with the above version is now line 1507: CUDA error: Illegal address in cuCtxSynchronize(), line 1507

I can try again with debug via CLI and provide a report if you wish.