Page MenuHome

Cycles render crash GPU RX Vega 56
Open, Confirmed, MediumPublic

Description

System Information
Operating system: Windows-10-10.0.18362 64 Bits
Graphics card: Radeon RX Vega ATI Technologies Inc. 4.5.13559 Core Profile Context 26.20.12001.7006

Blender Version
Broken: version: 2.80 (sub 74), branch: master, commit date: 2019-06-18 23:46, hash: rBe73647bf5b44
Worked: (optional)

system crash to black screen wile attempting to render a scene. Radeon driver is reset to default settings (even if it was on the default settings)
The crash is random,not crashing every time and not always at the same tile.

Exact steps for others to reproduce the error
[Please describe the exact steps needed to reproduce the issue]
[Based on the default startup or an attached .blend file (as simple as possible)]

Details

Type
Bug

Event Timeline

Same issue here on my Radeon VII, Cycles crashes after a short rendertime and the screen turns black. Sometimes it turns on again after a short time with blender being turned into a white window, sometimes it totally crashes the whole system. However, rendering on additional graphics cards (3x Radeon VII) that aren't connected to the display seems to work quite well.

EDIT: The file I've added crashes reproduceably all the time when being rendered on the GPU connected to the display.

Sebastian Parborg (zeddb) lowered the priority of this task from Needs Triage by Developer to Waiting for Developer to Reproduce.Jun 21 2019, 11:56 AM

@Jeroen Bakker (jbakker) I'm guessing we need to beg AMD for some cards :P

Does updating to 19.6.2 help?

No, that makes ist even worse for some scenes.

Does updating to 19.6.2 help?

No, I'm on 19.6.2 and the issue persists.
I did a clean install of the drivers and reset blender to defaults, didn't help.

U know what I just found out? When rendering in the image editor (not in a new window but in the user interface) things work out just fine. Just change that in the Header in the Render > Display Settings menu. I'll still continue testing with some other files but so far it seems to work.

using the image editor or other render display mode doesn't work for me.
it looks like it only happens withe files started on 2.79 and opened in 2.80.

don't know if it's related but i do get strange artifacts from time to time, only on display not saved to the final image.


looks like disabling particle/hair resolves the issue

new Radeon Drivers 19.7.1, recent blender 2.8 build, same issue.
Render with hair crashes.

@Jeroen Bakker (jbakker) can you reproduce this on a Vega card?

2.8 RC1, not a crash, a full system freeze.

Not able to reproduce on linux/Vega64/AMDPro drivers. My system gets unresponsive, but that is hardware related, after the scene is rendered it gets responsive again. (I used the SaturnVPacked.blend). No crashes or blank screens.
I will setup windows on my machine and see if I can reproduce it on Windows.

Can anyone test if it renders ok when run from the command prompt? (blender.exe -b <blendfile> -f)

My gut feeling is a OS/driver related issue on heavy loads of the system, not something we can do about. What would help is a log/stacktrace when the crash happens.

Unable to reproduce with

Note: that the OS is freshly installed.

Additional note. After retesting several times I received a Split kernel error: invalid ray state. what makes blender halt due to data corruption. It is unclear that if driver resets the GFX card and thereby creating invalid buffers or that the driver writes corrupt data and therefore resets the GFX card. Will check.

In the Event Viewer I see that the driver failed, but not the reason for it

Display driver amdkmdap stopped responding and has successfully recovered.

When I disable the motion blur pass it seems to be work fine.

This is the mimimum blend file based on default cube. Only changed the samples to 1024 and enabled motion blur. You can also test with 32 samples and a larger tile size here it also crashes for me.

Will check tomorrow if there is a recent change that introduces/enhanced this.

Also check https://community.amd.com/thread/229164 where similar issues was solved by:

The TDR did not work for me. It does not restart the driver when it fails, making the system halt. Note that this demonstrates that the blank screen is the OS trying to fix the state of the driver, by restarting it.

For now my idea is that the following happens:

  1. AMD GPU asks for more power from power supply
  2. Driver gets into a stalled state somehow
  3. Windows detects a stalled driver
  4. Windows reset the stalled driver (screen will blank)
  5. When restarting the driver the memory in the GPU gets reset to 0
  6. The running Blender render now detects invalid GPU memory state
  7. Blender will stall as the state machine is in an invalid state.

@Brecht Van Lommel (brecht), @Sebastian Parborg (zeddb) to test if this is really power supply related can we test on one of your machines if we can reproduce the issue.

Jeroen Bakker (jbakker) lowered the priority of this task from Waiting for Developer to Reproduce to Confirmed, Medium.

i have done a few tests and I don't think it is the driver or the PSU fault.

I created a file like "t65924_minimal.blend" for version 2.79.

First I rendered in 2.79b (official) and it was OK. than I rendered it in 2.80RC1 and it didn't crash.


so i disabled the motion blur and rendered again in 2.80RC1 and it didn't crash. i enabled motion blur and rendered again it it froze the system.

i did it several times and every time i render first in 2.79b (official) and then render in 2.80RC1 it renders fine.
if disable motion blur > render > enable motion blur > render (in 2.80RC1 after rendering in 2.79b), the system freezes.
if i restart/shut down and render first in 2.80RC1 the system freezes.
the same behavior is happening with 2.79 master (downloaded a few days ago).

to me it seems like the issue is with changes made to cycles between 2.79b (official) and the development branches.

@Assaf Negev (AssafNegev) It is a more complex than just saying it crashes due to changes in Blender so it is should be Blenders fault :-)
I was also able to reproduce the issue with the AMD Pro Drivers. I haven't been able to reproduce it on Polaris. Polaris uses a different OpenCL implementation in the AMD drivers, but the blender side is identical. We have contacted AMD for their insight.

@Assaf Negev (AssafNegev) It is a more complex than just saying it crashes due to changes in Blender so it is should be Blenders fault :-)

i didn't think so :-)
I should work on my powers of deduction (just finished a 'Sherlock' binge) :D

@Jeroen Bakker (jbakker) Thank you for your help & time.
i'll be happy to run any tests if needed.

new AMD driver 19.7.2 still freezes.

I'm having the same issue to a slightly worse degree;

I recently built a new computer and threw two Vega 56 cards in it with a 750w PSU.
I did some tests in Blender, and ran into this problem. I updated my drivers, tried lowering the card's power draw, tested Blender again, and it completely fried one of my GPUs - massive artifacting everywhere

So I drop another boatload of money on a 1000w PSU and another Vega 56, lower their power draw (both at -20% power limit, state 6 and 7 set to 1000mV in WattMan), and while going through tests, it black screens/crashes again.
No blender crash logs as far as I can find, and I just get the same "Display driver amdkmdap stopped responding and has successfully recovered." message in event viewer.

I don't want to keep throwing money at it when I truly don't know if getting yet another PSU upgrade will actually fix it, and I'm almost afraid to use blender now for fear that it's gonna fry another one of my cards.
Slightly off-topic but related enough, the reason I upgraded my computer in the first place is because a very similar issue (black screen, crashing) started happening on my old system with a GTX 970 while rendering in blender. I don't know if the cause is the same, but it's wildly frustrating when I can't even escape it with new hardware.

@Matthew (idkartist3d)

At this point I think that it is not hardware related. I tried different 2 motherboards 2 different CPUs, 3 different PSU (2x 750W + 1200W), 3 different GPU's 2 different drivers and 2 different OS's. My gut feeling is that it is an OS/driver related software error. The crash ain't consistent (not always when doing the same actions). We are still checking with AMD what could be the issue.

@Jeroen Bakker (jbakker) Dang, thank you for your extensive testing! Guess my PSU upgrade was a bit unneeded, but luckily now I know I don't need to upgrade again!

Hope the folks at AMD can figure it out soon, and again I appreciate your dedication to the issue - support like this is why I love blender :)

I can confirm the motion blur/hair issue on my machine as well. The SaturnVpacked.blend renders just fine with motion blur and Hair disabled and the system freezes reproduceably after a short time when either one of these functions is on.
I'm currently rendering on CPU+GPU
Threadripper 2990WX
4x Radeon VII, running the Adrenaline 19.7.3 driver

AMD has confirmed that it is an issue inside the driver. They have solved the issue, but don't know which driver it will be available in.
For now I will keep this task open for other people to find.