GPU and CPU not reported correctly and not used under linux
Closed, ArchivedPublic

Description

Testing the nightly build for 2.8 on an AWS server with Ubuntu 18.04, I get the following reported CPU and GPU:

Found device: Intel Core i7-6950X CPU @ 3.00GHz
<bpy_struct, CyclesDeviceSettings("Intel Core i7-6950X CPU @ 3.00GHz")>

Found device: GeForce GTX 1080
<bpy_struct, CyclesDeviceSettings("GeForce GTX 1080")>

The real GPU cards on the server are different, as reported by nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 00000000:00:03.0 Off |                  N/A |
| N/A   29C    P0    44W / 125W |      0MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GRID K520           Off  | 00000000:00:04.0 Off |                  N/A |
| N/A   33C    P0    43W / 125W |      0MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GRID K520           Off  | 00000000:00:05.0 Off |                  N/A |
| N/A   31C    P0    44W / 125W |      0MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GRID K520           Off  | 00000000:00:06.0 Off |                  N/A |
| N/A   32C    P0    38W / 125W |      0MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

The processor reported (truncated from /proc/cpuinfo):

processor       : 31
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping        : 4
microcode       : 0x42c
cpu MHz         : 2593.805
cache size      : 20480 KB
physical id     : 1
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 47
initial apicid  : 47
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti fsgsbase smep erms xsaveopt
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5187.66
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual

We also tested on different server configurations under linux and found the same GPU reported, even though the server had 4 K80 cards in it.

The render time is also off, the same render takes 5 times longer than 2.79 on the same server (same file, same resolution nas sampling settings). It looks like the GPU is not used at all, according to nvidia-smi.

Let me know if you need any other information to sort this out.

Details

Type
Bug
Brecht Van Lommel (brecht) triaged this task as Incomplete priority.Wed, Oct 17, 2:26 PM

I suspect the GPUs are not being detected at all. The GTX 1080 info might be something saved in the user preferences on another computer, not actually referring to the available devices.

  • How are you printing those GPUs? Can you attach the script that does that?
  • Please attach the output log of doing a test render with --debug-cycles where you have enabled the GPUs.
  • When you refer to 2.79, which version is that exactly? 2.79, 2.79a, 2.79b, daily builds?

reporting devices correctly here [but there i obviously something wrong with the AWS cards...]

import _cycles
device_list = _cycles.available_devices()
device_list
(('Intel Core i7-6700 CPU @ 3.40GHz', 'CPU', 'CPU'), ('GeForce GTX 970M (Display)', 'CUDA', 'CUDA_GeForce GTX 970M_0000:01:00'))

@Brecht Van Lommel (brecht): mind having a look?

I changed a bit the script so it prints more information. Here is the script and the output:

import _cycles, bpy
device_list = _cycles.available_devices()
print(device_list)

for device in bpy.context.user_preferences.addons['cycles'].preferences.devices:
    print("DETECTED VIA PREFERENCES: " + device.name)
(('Intel Xeon CPU E5-2650 v2 @ 2.60GHz', 'CPU', 'CPU'), ('GRID K520', 'CUDA', 'CUDA_GRID K520_0000:00:03'))
DETECTED VIA PREFERENCES: Intel Core i7-6950X CPU @ 3.00GHz
DETECTED VIA PREFERENCES: GeForce GTX 1080

Looks like _cycles.available_devices() detects them correctly, while the user_preferences way does not

So nuking userpref.blend should do it?

To enable all available CUDA devices, you can use a script like this:

scene = bpy.context.scene
scene.cycles.device = 'GPU'

prefs = bpy.context.user_preferences
cprefs = prefs.addons['cycles'].preferences
cprefs.compute_device_type = 'CUDA'

cuda_devices, opencl_devices = cprefs.get_devices()
for device in cuda_devices:
    device.use = device.type == 'CUDA'

_cycles.available_devices is an internal API and should not be accessed directly. preferences.devices lists all devices for which preferences have been stored, even ones that are currently unavailable but might get plugged in again later.

This script seems to detect correctly the devices. Thanks!

Main difference from 2.79 is that the preferences.devices property does not get populated automatically. Using get_devices() instead works well.