OpenCL kernel build fails on Linux with Mesa drivers (RX 480) #50522
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
11 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#50522
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Software Versions
Arch Linux x86_64 4.8.13-1
Mesa OpenCL 13.0.3-1
OCL-ICD 2.2.10-1
Blender 2.78a
Hardware
AMD FX-9590
ASUS Sabertooth 990FX
2x AMD RX 480 4GB, 4th generation GCN, also known as Polaris 10 or Ellesmere. (I've also experienced this with only one card, so I don't think the number of cards is the problem).
Error
I attempted to repair the OpenCL source myself but not being familiar with compute development I didn't get very far. Fixing those unary expression errors by inserting bool casts causes a different more complex error, the output is as before but at the end of the warnings there is now:
Which is beyond my ability to diagnose or repair.
Looking at the errors and the code that produces them leads me to believe there must be a version mismatch at work, either Blender targets a slightly different version of OpenCL or Mesa does not provide the correct version on Ellesmere. I can't find any conclusive information either way though.
(A similar build error is also observed when using Luxrender on GPU instead of Cycles).
Changed status to: 'Open'
Added subscriber: @duooratar
Added subscriber: @Sergey
Changed status from 'Open' to: 'Archived'
Thanks for the report, but we simply don't officially support MESA OpenCL implementation yet.
Added subscriber: @nashley
I am also experiencing a similar error with my
R9 280X
runningopencl-mesa 13.0.4-1
andocl-icd 2.2.10-1
onArch Linux x86_64 4.9.8-1
(Blender version2.78.a
).When, if ever, can we expect Blender's OpenCL builds to be compatible with mesa drivers?
With a little dedication, would an amateur OpenGL developer be able to help fix this?
Added subscriber: @ian_bruce
Changed status from 'Archived' to: 'Open'
This same problem (actually, several problems) has been reported here; it's not distribution- or hardware-specific:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=848258
Fortunately, all these problems are now solvable.
The "double precision constant requires cl_khr_fp64" compiler warning has just been addressed here:
https://developer.blender.org/rB2d3c44389ab7
I'm not clear on the origin of the compiler warning "implicit declaration of function {lgamma,native_tan} is invalid in C99"; I haven't been able to find out where these openCL functions are supposed to be defined. (Does anybody know?) But it's only a warning, so it may not matter.
There are three actual compile errors of the form "invalid argument type 'X *' to unary expression". This seems to be a problem with using a pointer as a boolean value, under some version of the C language standard; it's usually OK, as far as I know. A patch for this is attached below:
blender-cycles-ptr.diff
The "unsupported call to function get_local_size" error is a Mesa/LLVM problem. A patch is available upstream; it is recommended that Linux distributions apply it locally:
https://bugs.freedesktop.org/show_bug.cgi?id=99856#c24
Perhaps these changes can be made, and then openCL GPU rendering retested, to see if it then works.
ian_bruce@mail.ru
Changed status from 'Open' to: 'Resolved'
I've applied the patch at
8794a43
just because i prefer explicit NULL comparisons my self.Keep in mind that this is a proper C syntax and there is no reason why OpenCL which is C99-based will forbid this. In fact, this code works just fine on AMD/Intel/NVidia drivers.
Since this is a valid syntax, there is a very high likelyhood of this issue happening again.
So preference is to get this fixed in MESA itself.
Please also note that for such a collaboration about unsupported platform we expect either ML/IRC or a patch tracker used. We do not accept bug reports about platforms we do not officially support.
It turns out that the *"invalid argument type 'X ' to unary expression" compile errors are the result of a known bug in LLVM. It is said to be fixed in LLVM-v5; see here:
https://bugs.llvm.org/show_bug.cgi?id=30217
https://reviews.llvm.org/D29038
https://reviews.llvm.org/rL294313
a trivial test case illustrating the bug:
Since LLVM is the cause of the problem, patching completely correct Blender code, as I suggested above, is probably not an appropriate solution.
Instead, distributions should either require LLVM-v5 for openCL, or backport the bugfix into their current LLVM version -- it appears to be a single-line change (see link above).
Alternatively, the LLVM maintainers could be made aware that pre-version-5 LLVM is currently unusable for openCL, and asked to backport the fix into an earlier version, themselves.
ian_bruce@mail.ru
The two separate compiler errors involved here (neither is actually a Blender issue) are now the subject of two Debian bug reports. Patches are available upstream for both problems, as mentioned above.
unsupported call to function get_local_size
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857591
*invalid argument type 'X ' to unary expression
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857623
Other distributions may wish to take note of the results, and resolve these issues in a similar way.
ian_bruce@mail.ru
It turns out that these compiler warnings --
implicit declaration of function {lgamma,native_tan} is invalid in C99
occur because the version of libclc (Mesa openCL) being used is not recent enough to have implemented those functions; one has only been available for a few weeks. These are the relevant development commits:
07fa4ae82d
a2593ed8ad
Debian bug report for this issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857710
This bug, and the two above, are aggregated here, so these problems can be resolved as quickly as possible:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857718
All three problems are fixable NOW, as I have described. Other distributions should apply these patches and updates in their own archives.
It appears that if this were done, GPU rendering with openCL might very well start working.
People who are interested in this outcome should apply pressure in appropriate places; it's no longer a Blender problem.
ian_bruce@mail.ru
@ian_bruce, i'm not sure what we can do with that knowledge. We are not drivers developers and are not involved into any distro development process. Guess this info better be directed to distro maintainers to ensure they ship proper drivers for OpenCL support (Blender is not the only program on planet which will benefit from that ;)
Yes, that's exactly what I said --
"Since LLVM is the cause of the problem, patching completely correct Blender code, as I suggested above, is probably not an appropriate solution."
"The two separate compiler errors involved here (neither is actually a Blender issue) are now the subject of two Debian bug reports."
"This bug, and the two above, are aggregated here, so these problems can be resolved as quickly as possible:"
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857718
"Other distributions should apply these patches and updates in their own archives. It appears that if this were done, GPU rendering with openCL might very well start working. People who are interested in this outcome should apply pressure in appropriate places; it's no longer a Blender problem."
I found this discussion when I was searching for information on the same problem; it seems that there are more than fifteen people subscribed to it. Presumably other people will find it as well:
https://www.google.com/search?q=blender+mesa+opencl (second result)
I have now told anybody who cares, that "it's no longer a Blender problem", and that they should "apply pressure in appropriate places", "distributions should apply these patches and updates in their own archives." Along with exactly which patches and updates need to be applied to get openCL working for this or any other application.
It seems to me that this is all useful information for anybody who cares about the issue, and since the question was asked here, this is where I've answered it. Nobody need ask it again; I've told them exactly what their distribution needs to do to fix the problem, and filed appropriate bug reports for Debian, myself. Now other people can file appropriate bug reports with their distributions, and GPU rendering on AMD hardware can finally be made to work.
ian_bruce@mail.ru
Added subscriber: @RafaelRistovski
This comment was removed by @RafaelRistovski
Added subscriber: @rasterizer
I understand this is not a Blender issue, however six months after the various bugs were filed, Blender 2.79 still crashes on compilation under Ubuntu 17.10 Beta 2 with LLVM 5.0 and Mesa 17.2.2, even though it appears to me that all reported bugs were fixed.
Personally, I reported https://bugs.freedesktop.org/show_bug.cgi?id=102009 against Mesa since I didn't see any open issues there and it was unclear to me which part of the stack was causing the crash.
Has anyone else retried compilation and can point out which patches are still missing?
The back trace in report indicates crash somewhere inside of Mesa/LLVM. We can not fix that or patch from Blender side, the driver itself is to be brought to a state when it can handle complexity of Cycles kernels.
Well, that's exactly what I wrote.
What I was hoping someone would be able to provide is a list of driver and / or compiler patches that have not been merged yet into the respective code bases, so that I can do a follow-up there. The patches that were linked here all look like they were merged, yet Blender still crashes under the latest Ubuntu with Mesa OpenCL.
Added subscriber: @LazyDodo
There's no un-merged patches on our side.
Added subscriber: @FaresTP
Added subscriber: @imyxhuang
Removed subscriber: @imyxhuang
Added subscriber: @IanHuang
Hi, I'd like to see what I can do regarding open-source driver support in Blender with Mesa and whatnot but I don't see how I would even test it. I have a Raven Ridge APU and Blender doesn't recognize it as a compute device—as expected, I suppose. But given that the CYCLES_OPENCL_SPLIT_KERNEL_TEST=1 environment variable doesn't do anything (I don't see it in the manpage so I assume it doesn't work anymore), I'm really at a loss.
Can anyone get Blender to even attempt using their GPU as a compute device without AMDGPU-PRO? A crash would be better than nothing at this point.
Sorry if it's a little off-topic. IRC and Discord didn't help. Thanks!
Hello, on my side Blender successfully recognizes my Southern Islands GPU and I can select it as a computer device. I have the env. variable exported somewhere in my bashrc so I am not sure if that's why it detects it.
If you are expecting a crash - you shall get one.
Blender (well, more like llvm/clang) manages to compile some of the kernels but not all. I actually wanted to debug this but I haven't found the time.
I might look into this again some day, in like which case I will probably open a new issue as I reckon it still works on the RX 480.
Hi Rafael, I am still getting kernel compilation crashes with an RX480 on latest Mesa 18.1 under Ubuntu 18.04. Typical errors revolve around incorrectly defined consts in the (generated?) code that is being compiled. Is there anything I can contribute to help getting this fixed?
Hi Markus,
I just tested, and here is the output that gets generated for me:
Note the
OpenCL C version 1.1 does not support the 'static' storage class specifier
.As far as I know, (according to GalliumCompute feature matrix), the 'Clover' OpenCL Mesa driver mainly supports OpenCL 1.1 (at least on my GPU).
It would seem, that unless the development of Clover picks up again, the only real alternative would be to use the OpenCL driver from
amdgpu-pro
.Added subscriber: @nokipaike
since we are on the subject, I would like to ask a curiosity ... in theory cycles opencl rendering could be compiled on these devices based on fpga ??
Running OpenCL™ on Intel® FPGAs
https://www.youtube.com/watch?v=nyivIknJaV8
Oh ... oh.
Yeah, it works when I export the variable first rather than just calling Blender on the same line as the assignment. How silly of me.
I'm getting a different error.
The good news is ROCm seems to be under heavy development. This just came out today: https://www.phoronix.com/scan.php?page=news_item&px=AMD-OpenCL-2.0-ROCm-2.0-Work.
I'm not exactly sure how ROCm plays into Mesa or AMDGPU or whatever, but hopefully the stack will be ready for 2.1 sometime soon.
After my graphics card becomes supported by ROCm, that is.