Page MenuHome

Segfault when accessing Cycles Render Devices preferences with two OpenCL platforms, AMD Mesa Clover and Intel Gen9 NEO
Open, Needs Triage by DeveloperPublic

Description

System Information
Operating system: Linux-5.1.20-desktop-2.mga7-x86_64-with-mageia-7-Official 64 Bits
Graphics card 1 (integrated): Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2) Intel Open Source Technology Center 4.5 (Core Profile) Mesa 19.1.3
Graphics card 2 (discrete): AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0) X.Org 4.5 (Core Profile) Mesa 19.1.3

Blender Version
Broken: version: 2.80 (sub 75), branch: master, commit date: 2019-07-29 14:47, hash: rBf6cb5f54494e

2.81 (sub 0), branch: master, commit date: 2019-07-30 20:50, hash: `rB5359b7a03307`

Worked: unknown

Short description of error
My HP Spectre x360 laptop has an Intel HD Graphics 630 IGP and an AMD Radeon RX Vega M GL dGPU.
The AMD card uses Mesa's Clover OpenCL platform, which Blender does not seem to support (cf. T68009 and D2171).

I built and installed https://github.com/intel/compute-runtime for the Intel Gen9 IGP, and confirmed that it works fine for an example OpenCL program.

$ ./clinfo --list
Platform #0: Clover
 `-- Device #0: AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0)
Platform #1: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Gen9 HD Graphics NEO

In Blender, when I access Edit > Preferences > System to review what OpenCL platforms were detected, Blender segfaults:

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007fffffffb154 in ?? ()
(gdb) bt
#0  0x00007fffffffb154 in ?? ()
#1  0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#2  0x00007fffb306cc54 in ?? () from /lib64/libigc.so.1
#3  0x00007fffb2f0dab0 in ?? () from /lib64/libigc.so.1
#4  0x00007fffb2e0cfff in ?? () from /lib64/libigc.so.1
#5  0x00007fffb2e0e2ef in ?? () from /lib64/libigc.so.1
#6  0x00007fffb2dda483 in ?? () from /lib64/libigc.so.1
#7  0x00007fffb2ddb298 in ?? () from /lib64/libigc.so.1
#8  0x00007fffb2e88eb7 in ?? () from /lib64/libigc.so.1
#9  0x00007fffb2e8b873 in ?? () from /lib64/libigc.so.1
#10 0x00007fffb9b3b8b2 in ?? () from /usr/lib64/intel-opencl/libigdrcl.so
#11 0x00007fffb9b1ebcb in ?? () from /usr/lib64/intel-opencl/libigdrcl.so
#12 0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#13 0x00007fffb9b1f472 in ?? () from /usr/lib64/intel-opencl/libigdrcl.so
#14 0x00007fffb9b92e7d in ?? () from /usr/lib64/intel-opencl/libigdrcl.so
#15 0x00007fffb9af8e9b in ?? () from /usr/lib64/intel-opencl/libigdrcl.so
#16 0x00007fffc603b0ba in ?? () from /lib64/libOpenCL.so
#17 0x00007fffc603d2a4 in clGetPlatformIDs () from /lib64/libOpenCL.so
#18 0x0000000003220776 in ccl::device_opencl_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) ()
#19 0x00000000031e87c6 in ccl::Device::available_devices(unsigned int) ()
#20 0x0000000001548699 in ?? ()
#21 0x000000000174f969 in _PyMethodDef_RawFastCallKeywords ()
#22 0x000000000174fa25 in _PyCFunction_FastCallKeywords ()
#23 0x000000000114f33f in _PyEval_EvalFrameDefault ()
#24 0x0000000001803c23 in _PyEval_EvalCodeWithName ()
#25 0x000000000174f4a6 in _PyFunction_FastCallKeywords ()
#26 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#27 0x0000000001145530 in ?? ()
#28 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#29 0x0000000001145530 in ?? ()
#30 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#31 0x0000000001145530 in ?? ()
#32 0x000000000174f3e6 in _PyFunction_FastCallDict ()
#33 0x00000000013e69ce in ?? ()
#34 0x00000000014f2c2f in ?? ()
#35 0x0000000002d48623 in ?? ()
#36 0x0000000002d4ab86 in ED_region_panels_layout_ex ()
#37 0x0000000002d4b06d in ED_region_panels_ex ()
#38 0x0000000002d4ebf1 in ED_region_do_draw ()
#39 0x0000000001517513 in wm_draw_update ()
#40 0x0000000001514c30 in WM_main ()
#41 0x00000000010c0abe in main ()

I tried to reproduce the issue with a self-compiled build of 036312ecff459f7600361d5aba9a0dba896849a1 (current master branch) and it does not crash, though it also does not list any platform for OpenCL.

Note that before I installed intel/compute-runtime, no segfaults happened (but Clover wasn't recognized).
I now also get a segfault trying to get system-info.txt, so here's the report from before installing compute-runtime:

Complete clinfo output with both platforms installed:

Exact steps for others to reproduce the error

  • Install Mesa OpenCL for AMD dGPU + Intel Computer Runtime for Intel IGP on Linux.
  • In Blender 2.80, access Edit > Preferences > System.

Details

Type
Bug

Event Timeline

Note that the bug may very well be in https://github.com/intel/compute-runtime

I'm trying to figure out how to get debug info from libigc.so.1 and libigdrcl.so and I'll post an updated backtrace, and cross-post it on https://github.com/intel/compute-runtime.

Updated backtrace with debug symbols:

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007fffffffb164 in ?? ()
Missing separate debuginfos, use: debuginfo-install lib64asyncns0-0.8-11.mga7.x86_64 lib64bsd0-0.9.1-3.mga7.x86_64 lib64clang8.0-8.0.0-1.mga7.x86_64 lib64dbus1_3-1.13.8-4.mga7.x86_64 lib64drm2-2.4.98-1.mga7.x86_64 lib64drm_amdgpu1-2.4.98-1.mga7.x86_64 lib64drm_intel1-2.4.98-1.mga7.x86_64 lib64drm_nouveau2-2.4.98-1.mga7.x86_64 lib64drm_radeon1-2.4.98-1.mga7.x86_64 lib64edit0-3.1-0.20181209.1.mga7.x86_64 lib64elfutils1-0.176-1.mga7.x86_64 lib64expat1-2.2.6-2.mga7.x86_64 lib64ffi6-3.2.1-7.mga7.x86_64 lib64flac8-1.3.2-3.mga7.x86_64 lib64gcrypt20-1.8.4-1.mga7.x86_64 lib64gpg-error0-1.36-1.mga7.x86_64 lib64jack-devel-1.9.12-2.mga7.x86_64 lib64llvm8.0-8.0.0-1.mga7.x86_64 lib64lz4_1-1.8.3-1.mga7.x86_64 lib64lzma5-5.2.4-2.mga7.x86_64 lib64mesaopencl1-19.1.2-1.mga7.x86_64 lib64ncurses6-6.1-20181117.3.mga7.x86_64 lib64numa1-2.0.12-1.mga7.x86_64 lib64pciaccess0-0.14-2.mga7.x86_64 lib64pulseaudio0-12.2-5.mga7.x86_64 lib64pulsecommon12.2-12.2-5.mga7.x86_64 lib64sdl2.0-devel-2.0.9-1.mga7.x86_64 lib64sndfile1-1.0.28-8.mga7.x86_64 lib64systemd0-241-8.1.mga7.x86_64 lib64vorbis0-1.3.6-3.mga7.x86_64 lib64vorbisenc2-1.3.6-3.mga7.x86_64 lib64wrap0-7.6-48.mga7.x86_64 lib64x11-xcb1-1.6.7-1.mga7.x86_64 lib64x11_6-1.6.7-1.mga7.x86_64 lib64xau6-1.0.9-1.mga7.x86_64 lib64xcb-dri2_0-1.13.1-1.mga7.x86_64 lib64xcb-dri3_0-1.13.1-1.mga7.x86_64 lib64xcb-glx0-1.13.1-1.mga7.x86_64 lib64xcb-present0-1.13.1-1.mga7.x86_64 lib64xcb-sync1-1.13.1-1.mga7.x86_64 lib64xcb1-1.13.1-1.mga7.x86_64 lib64xcursor1-1.2.0-1.mga7.x86_64 lib64xdamage1-1.1.5-1.mga7.x86_64 lib64xdmcp6-1.1.3-1.mga7.x86_64 lib64xext6-1.3.4-1.mga7.x86_64 lib64xfixes3-5.0.3-2.mga7.x86_64 lib64xi6-1.7.9-3.mga7.x86_64 lib64xshmfence1-1.3-2.mga7.x86_64 lib64xxf86vm1-1.1.4-3.mga7.x86_64
(gdb) bt
#0  0x00007fffffffb164 in ?? ()
#1  0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#2  0x00007fffb0a84234 in __gthread_once (__func=<optimized out>, __once=0x7fffb1ffefcc <InitializeCheckInstrTypesPassFlag>) at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#3  std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (
    __f=@0x7fffb0a83e70: {void *(llvm::PassRegistry &)} 0x7fffb0a83e70 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, __once=...) at /usr/include/c++/8.3.1/mutex:684
#4  llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (
    F=@0x7fffb0a83e70: {void *(llvm::PassRegistry &)} 0x7fffb0a83e70 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, flag=...) at /usr/include/llvm/Support/Threading.h:102
#5  initializeCheckInstrTypesPass (Registry=...) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:49
#6  IGC::CheckInstrTypes::CheckInstrTypes (this=<optimized out>, instrList=0x7fffffffb164) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:56
#7  0x00007fffb0924be0 in IGC::unify_opt_PreProcess (pContext=pContext@entry=0x7fffffffb0d0) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/ShaderCodeGen.cpp:1051
#8  0x00007fffb082401f in IGC::CommonOCLBasedPasses (pContext=0x7fffffffb0d0, BuiltinGenericModule=std::unique_ptr<llvm::Module> = {...}, BuiltinSizeModule=std::unique_ptr<llvm::Module> = {...})
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/UnifyIROCL.cpp:186
#9  0x00007fffb082530f in IGC::UnifyIRSPIR(IGC::OpenCLProgramContext*, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >) ()
    at /usr/include/c++/8.3.1/ext/new_allocator.h:116
#10 0x00007fffb07f1464 in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7fffffffb800, pOutputArgs=pOutputArgs@entry=0x7fffffffb7d0, 
    inputDataFormatTemp=inputDataFormatTemp@entry=TC::TB_DATA_FORMAT_LLVM_TEXT, IGCPlatform=..., profilingTimerResolution=<optimized out>) at /usr/include/c++/8.3.1/bits/move.h:74
#11 0x00007fffb07f2278 in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7fffffffb800, pOutputArgs=pOutputArgs@entry=0x7fffffffb7d0, inputDataFormatTemp=TC::TB_DATA_FORMAT_LLVM_TEXT, IGCPlatform=..., 
    profilingTimerResolution=<optimized out>) at /usr/include/c++/8.3.1/ext/new_allocator.h:86
#12 0x00007fffb089ffe7 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, specConstantsIds=specConstantsIds@entry=0x0, 
    specConstantsValues=specConstantsValues@entry=0x0, options=<optimized out>, internalOptions=<optimized out>, tracingOptions=<optimized out>, tracingOptionsCount=<optimized out>, gtPinInput=<optimized out>)
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:230
#13 0x00007fffb08a29a3 in IGC::IgcOclTranslationCtx<1ul>::TranslateImpl (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, options=<optimized out>, internalOptions=<optimized out>, 
    tracingOptions=<optimized out>, tracingOptionsCount=0) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/igc_ocl_translation_ctx.h:45
#14 0x00007fffb75631ca in IGC::IgcOclTranslationCtx<1ul>::Translate<IGC::OclTranslationOutput<1ul> > (tracingOptionsCount=0, tracingOptions=0x0, internalOptions=<optimized out>, options=<optimized out>, 
    src=<optimized out>, this=<optimized out>) at /usr/include/igc/ocl_igc_interface/igc_ocl_translation_ctx.h:51
#15 NEO::translate<IGC::IgcOclTranslationCtx<3ul> > (internalOptions=<optimized out>, options=<optimized out>, src=<optimized out>, tCtx=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--c
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.inl:27
#16 NEO::CompilerInterface::getSipKernelBinary (this=0x7fffc6ef8ee0, kernel=<optimized out>, device=..., retBinary=std::vector of length 0, capacity 0) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.cpp:348
#17 0x00007fffb7547f54 in std::call_once<NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&>(std::once_flag&, NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&)::{lambda()#2}::_FUN() () at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/built_ins/built_ins.cpp:101
#18 0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#19 0x00007fffb7548d02 in NEO::BuiltIns::getSipKernel (this=0x7fffc6f21c00, type=<optimized out>, device=...) at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#20 0x00007fffb75b4c2b in NEO::Platform::initialize (this=0x7fffc6fca700) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/platform/platform.cpp:181
#21 0x00007fffb7528d38 in clGetPlatformIDs (numEntries=<optimized out>, platforms=<optimized out>, numPlatforms=<optimized out>) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/api/api.cpp:78
#22 0x00007fffc3a520ba in _find_and_check_platforms (num_icds=2) at ocl_icd_loader.c:451
#23 __initClIcd () at ocl_icd_loader.c:652
#24 _initClIcd_real () at ocl_icd_loader.c:702
#25 0x00007fffc3a542a4 in _initClIcd () at ocl_icd_loader.c:724
#26 clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x7fffffffc71c) at ocl_icd_loader.c:846
#27 0x0000000003220776 in ccl::device_opencl_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) ()
#28 0x00000000031e87c6 in ccl::Device::available_devices(unsigned int) ()
#29 0x0000000001548699 in ?? ()
#30 0x000000000174f969 in _PyMethodDef_RawFastCallKeywords ()
#31 0x000000000174fa25 in _PyCFunction_FastCallKeywords ()
#32 0x000000000114f33f in _PyEval_EvalFrameDefault ()
#33 0x0000000001803c23 in _PyEval_EvalCodeWithName ()
#34 0x000000000174f4a6 in _PyFunction_FastCallKeywords ()
#35 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#36 0x0000000001145530 in ?? ()
#37 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#38 0x0000000001145530 in ?? ()
#39 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#40 0x0000000001145530 in ?? ()
#41 0x000000000174f3e6 in _PyFunction_FastCallDict ()
#42 0x00000000013e69ce in ?? ()
#43 0x00000000014f2c2f in ?? ()
#44 0x0000000002d48623 in ?? ()
#45 0x0000000002d4ab86 in ED_region_panels_layout_ex ()
#46 0x0000000002d4b06d in ED_region_panels_ex ()
#47 0x0000000002d4ebf1 in ED_region_do_draw ()
#48 0x0000000001517513 in wm_draw_update ()
#49 0x0000000001514c30 in WM_main ()
#50 0x00000000010c0abe in main ()

We've had problems in the past where there was a conflict between LLVM symbols, with Blender using a different LLVM version than the driver.

We try to hide these symbols on the Blender side (using source/creator/blender.map), but may have missed some.

If your build did not include OSL / LLVM, that could explain why you couldn't reproduce it.

Thanks, that's quite likely. I'm trying to debug further issues with this OpenCL runtime and Blender upstream and it does sound like having a slightly unexpected LLVM or Clang version can lead to such wild issues: https://github.com/intel/compute-runtime/issues/195

I expect that we should be able to sort out the packaging with the upstream Intel packager in coming days, so we can revisit this issue at that time and see if that runtime still makes Blender crash and if so whether we need to hide some more symbols.

If your build did not include OSL / LLVM, that could explain why you couldn't reproduce it.

That's correct, my original build was with default options and thus WITH_LLVM=OFF and WITH_CYCLES_OSL=OFF.

I just did a new build with both ON and I still can't reproduce the crash though. But it looks like the official Blender 2.80 release doesn't link libLLVM dynamically, so I guess it links it statically? If so there could indeed be a symbol mismatch between Blender's statically linked LLVM and the one from my distro loaded dynamically by IGC.