Page MenuHome

Switching view modes to and from Cycles GPU / starting renders hangs system for a few seconds.
Confirmed, NormalPublicBUG

Description

System Information
Operating system: Linux-5.4.0-26-generic-x86_64-with-debian-bullseye-sid 64 Bits
Graphics card: TITAN X (Pascal)/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 440.64

Blender Version
Broken: version: 2.80, 2.81, 2.82, 2.83 (c036ef136960)
Worked: Less noticeable in 2.80

Short description of error
When using Blender on Ubuntu / Xubuntu 20.04, whenever you switch to or away from something GPU Cycles related, Blender pins one CPU thread at 100% and the system hangs for a few seconds.
Some examples of this are:

  • Switching from Solid to Rendered (and vice versa) in the Viewport
  • Switching to the Render Properties when Cycles and GPU are set
  • Starting a render with Cycles GPU

It seems that this occurs when Blender / Cycles needs to "probe" the gpu(s) in the system.
At least that's the best way I can describe it.
This behavior is absent on Xubuntu 18.04 on the same hardware.

In 2.80 the Blender UI merely freezes a few seconds, from 2.81 onward the entire desktop freezes and hitches.

Exact steps for others to reproduce the error

  • Set Cycles to render on GPU (CUDA)
  • Switch to rendered viewport / render any scene (Even on empty scenes this happens)

Event Timeline

I'm unable to reproduce this issue on my system with a GTX 1050ti on POP!_OS 20.04 (Based on Ubuntu 20.04)

It looks like the problem described in T73076
Could you read the comments there and see if the solution applies here too?

Germano Cavalcante (mano-wii) changed the task status from Needs Triage to Needs Information from User.Tue, May 12, 4:18 PM

I confused, I thought it was about OptiX but the description mentions CUDA.
It may not be related to the problem described in T73076.

Yeah, it's with CUDA in my case.

It even happens when I disable the display GPU for CUDA rendering in a multi-gpu system.

Running blender with --debug-cycles, these are the messages it spits out when everything hangs:

CPU flags:

AVX2       : True
AVX        : True
SSE4.1     : True
SSE3       : True
SSE2       : True
BVH layout : BVH8
Split      : False

CUDA flags:

Adaptive Compile : False

OptiX flags:

CUDA streams : 1

OpenCL flags:

Device type    : ALL
Debug          : False
Memory limit   : 0

I0513 00:30:15.230228 4160 device_cuda.cpp:41] CUEW initialization succeeded
I0513 00:30:15.230283 4160 device_cuda.cpp:43] Found precompiled kernels
I0513 00:30:15.312098 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.312129 4160 device_cuda.cpp:169] Added device "TITAN X (Pascal)" with id "CUDA_TITAN X (Pascal)_0000:08:00".
I0513 00:30:15.312203 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.312212 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:09:00".
I0513 00:30:15.312286 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.312295 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:43:00".
I0513 00:30:15.312371 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.312381 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:44:00".
I0513 00:30:15.314888 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.314906 4160 device_cuda.cpp:169] Added device "TITAN X (Pascal)" with id "CUDA_TITAN X (Pascal)_0000:08:00".
I0513 00:30:15.314980 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.314990 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:09:00".
I0513 00:30:15.315066 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.315078 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:43:00".
I0513 00:30:15.315152 4160 device_cuda.cpp:166] Device has compute preemption or is not used for display.
I0513 00:30:15.315163 4160 device_cuda.cpp:169] Added device "GeForce GTX 1080 Ti" with id "CUDA_GeForce GTX 1080 Ti_0000:44:00".

Hanging occurs on:

  • Switching to render settings in properties panel for the first time in a new scene.
  • Switching to Rendered view and back in any other view in the viewport.
  • Right before a render starts, after scene building and as data needs to be transferred to the GPU.
  • After rendering, during the image finishing the render and it being displayed in the image editor.

I'm happy to run other with other debug settings to figure out what's going on, just don;t know which ones would be best.

Brecht Van Lommel (brecht) changed the task status from Needs Information from User to Confirmed.EditedThu, May 14, 12:23 AM

I just upgraded to Ubuntu 20.04 and I'm finding the same issue. System specs:

  • Linux 5.4.0-29-generic x86_64
  • AMD Ryzen Threadripper 2990WX 32-Core Processor
  • Quadro RTX 5000, with all these driver versions
    • 440.64 (the default Ubuntu drivers, package version 440.82+really.440.64)
    • 440.59 (manually installed)
    • 440.82 (manually installed)

cuCtxCreate / cuGLCtxCreate and cuCtxDestroy are functions that take a long time, about 2s here. Running with perf shows nv_alloc_pages and nv_free_pages in the NVIDIA driver are taking a lot of time.

Since there is a function named on_each_cpu in this call stack, perhaps the CPU having 32 cores may make it worse?

I can reproduce a similar hang in the CUDA samples, for example simpleDrvRuntime and cuSolverSp_LinearSolver. They also freeze the entire screen for about 2s.

So I think this is an issue in the NVIDIA driver and/or Linux kernel. I'm quite sure I didn't have this issue on Ubuntu 18.04 with driver version 440.59.

Alaska (Alaska) added a comment.EditedThu, May 14, 12:56 AM

@Brecht Van Lommel (brecht) It's possibly the kernel. Testing with POP!_OS 20.04 (Based on Ubuntu 20.04) I'm not experiencing this issue.
Kernel version: Linux-5.4.0-7626-generic-x86_64-with-debian-bullseye-sid 64 Bits
GPU: GTX 1050Ti with driver 440.82
CPU: 12 core Ryzen 9 3900X (Locked to a fixed frequency if that makes a difference)
RAM: 32GB

The only "hitch" I see when changing to Cycles rendered mode is a split second hitch as the BVH and initial samples are rendered. It's no where near the two seconds you're describing.

Brecht Van Lommel (brecht) changed the subtype of this task from "Report" to "Bug".Thu, May 14, 1:21 AM

The fix for this will likely not be in Blender, but keeping the report open for tracking the status.

For reference, I'm running a 1st gen Threadripper CPU with 16 cores, which might be why we're experiencing similar behavior.
I'll try the same on Pop_OS and see it it persists, so we can get some more info, although with it being based on Ubuntu I'm not expecting massively different results.

Did some additional testing with different linux distros to complete the picture.
Same results occur with Pop!_OS, Solus & Fedora 32, all using Nvidia driver version 440.82, with the same hardware as the original report.
At least we know it's not something OS specific, might even be hardware related.

If there's anything else I can test, just let me know, happy to help figure it all out.