Page MenuHome

Crash when choosing cycles device with python
Closed, ResolvedPublic

Description

System Information
Operating system: Ubuntu 20.04.1 LTS
Graphics card: GTX 1070 + GTX 1070 + GTX 1050 Ti, NVIDIA 440.100

Blender Version
Broken: 2.90.1, Commit date: 2020-09-23 06:43, Hash 3e85bb34d0d7
Worked: 2.83.7

Short description of error
Blender crashes when using python to choose CUDA GPU for cycles. I don't know if this happens in systems with only 1 GPU.

Exact steps for others to reproduce the error
Run Blender from command line with python expression to select GPU:

blender --factory-startup --python-expr "import bpy; cuda_devices, opencl_devices = bpy.context.preferences.addons['cycles'].preferences.get_devices(); cuda_devices[0].use = True"

More information:
Using get_devices_for_type() instead of get_devices() does not crash.

blender --factory-startup --python-expr "import bpy; cuda_devices = bpy.context.preferences.addons['cycles'].preferences.get_devices_for_type('CUDA'); cuda_devices[0].use = True"

The specific sequence of device types used by get_devices() matters, CUDA, then OPTIX, then OPENCL. Changing the sequence to CUDA last does not crash. Something about OPTIX and OPENCL mutates the CUDA devices. See the dump of cuda_devices before and after getting OPENCL devices:

blender --factory-startup --python-expr "import bpy; \
cuda_devices = bpy.context.preferences.addons['cycles'].preferences.get_devices_for_type('CUDA'); \
optix_devices = bpy.context.preferences.addons['cycles'].preferences.get_devices_for_type('OPTIX'); \
print('CUDA devices before getting OPENCL devices', cuda_devices); \
opencl_devices = bpy.context.preferences.addons['cycles'].preferences.get_devices_for_type('OPENCL'); \
print('CUDA devices after getting OPENCL devices', cuda_devices); \
print('OPTIX devices', optix_devices); \
print('OPENCL devices', opencl_devices); \
cuda_devices[0].use = True"

CUDA devices before getting OPENCL devices [<bpy_struct, CyclesDeviceSettings("GeForce GTX 1070")>, <bpy_struct, CyclesDeviceSettings("GeForce GTX 1070")>, <bpy_struct, CyclesDeviceSettings("GeForce GTX 1050 Ti")>, <bpy_struct, CyclesDeviceSettings("AMD Ryzen 9 3900X 12-Core Processor")>]
CUDA devices after getting OPENCL devices [<bpy_struct, CyclesDeviceSettings("")>, <bpy_struct, CyclesDeviceSettings("")>, <bpy_struct, CyclesDeviceSettings("")>, <bpy_struct, CyclesDeviceSettings("")>]
OPTIX devices [<bpy_struct, CyclesDeviceSettings("GeForce GTX 1070")>, <bpy_struct, CyclesDeviceSettings("GeForce GTX 1070")>, <bpy_struct, CyclesDeviceSettings("GeForce GTX 1050 Ti")>]
OPENCL devices [<bpy_struct, CyclesDeviceSettings("AMD Ryzen 9 3900X 12-Core Processor")>]

Also, this seems to have something to do with OPTIX. Removing OPTIX from the sequence above also does not crash.

Event Timeline

Robert Guetzkow (rjg) changed the task status from Needs Triage to Needs Information from User.Oct 7 2020, 10:41 PM

I can't reproduce this on my one GPU system. Could you run Blender from the terminal with ./blender --debug --debug-python --debug-gpu > ~/blender_debug_output.txt 2>&1 and upload the output as well as the crash log / stack trace?

These are pretty short. Do I need a certain build for debugging?

Robert Guetzkow (rjg) changed the task status from Needs Information from User to Needs Triage.Oct 8 2020, 12:37 AM

@Cedar Myers (CedarM) No, the log files are exactly how they are supposed to be.

I have tested the 2.91.0-3bc808ebcbf6 nightly build and I can confirm that this fixes get_devices() but get_devices_for_type() still suffers the same problem when called in sequence (CUDA, then OPTIX, then OPENCL). Should this be filed as a new issue, or should this issue be re-opened?

Also, I'm uncertain if this is relevant for LTS backport since 2.83 does not crash, but I'll leave that to you guys to determine.

@Cedar Myers (CedarM) Since this ticket as already been closed with an associated commit, it's probably a good idea to open another ticket. Just to make sure it's not forgotten.

@Cedar Myers (CedarM) other issues should be reported separately, get_devices_for_type is returning data that's re-allocated, so we'll need to copy the return values to prevent problems like this from happening entirely.