Page MenuHome

Handle Mesa3D's OpenCL implementation (Clover) in the platform enumerator
Needs ReviewPublic

Authored by Edward O'Callaghan (funfunctor) on Aug 24 2016, 2:24 AM.

Details

Summary

The following attached patch fixes up the platform enumerator to pickup 'Clover' as a valid platform. Currently we twiddle on the same features as would be the case for the 'KERNEL_OPENCL_AMD' platform as, in gallium we lower to LLVM IR which is supported by AMD Radeon Group and the community.

Kind Regards,
Edward O'Callaghan.

Diff Detail

Event Timeline

Edward O'Callaghan (funfunctor) retitled this revision from to Handle Mesa3D's OpenCL implementation (Clover) in the platform enumerator.Aug 24 2016, 2:24 AM
Edward O'Callaghan (funfunctor) updated this object.
Edward O'Callaghan (funfunctor) updated this revision to Diff 7290.

Several questions:

  • What are the requirements for the Mesa version?
  • What are the requirements for the hardware?
  • Does Cycles manage to render all benchmark files correctly on that required configuration?
  • Is this only supposed to work for split kernel or megakernel compilation is also supported?
  • What are the render times (you might want to compare them with numbers from spreadsheet) ?
intern/cycles/kernel/kernel_types.h
131

This and below seems to be possible to be replaced with single #define __KERNEL_ADV_SHADING__.

Use 'KERNEL_ADV_SHADING' instead of explicitly stating each define.

Several questions:

  • What are the requirements for the Mesa version?

12.0.1 seems to correctly parse the CL and lower it to TGSI.

  • What are the requirements for the hardware?

That's a broad question, but clover is built on top of Gallium so provided all the TGSI opcodes are lower and the actual hw supports the required operations in hardware then its supported. Specifically I am targeting radeonsi here so we lower to LLVM IR in our case.

  • Does Cycles manage to render all benchmark files correctly on that required configuration?

Not everything, I am dealing with bugs deep inside the LLVM pass manager at the moment however that is out of scope here.

  • Is this only supposed to work for split kernel or megakernel compilation is also supported?

Only split.

  • What are the render times (you might want to compare them with numbers from spreadsheet) ?

Not worried about perf so much at the moment as stability, this obviously stresses the LLVM backend so you will need to be on trunk there. LLVM 3.9 is about to be released also so keep that in mind.

This change is enough to get the blender side of things in shape to work on lower parts of the stack where I normally work so further changes will occur there projecting forwards. Keep in mind that the new AMD driver (radeonsi in mesa userspace) and (amdgpu in kernel space) are still maturing being very recent things.

Kind Regards,
Edward.

intern/cycles/kernel/kernel_types.h
131

Correct and fixed! Good spot thanks!

When we add functionality to Blender, we need to give clear indication to users what they can/should expect from it and how/if they can benefit from it.

Here you can at least provide information like:

  • Which benchmark files work correctly
  • Which benchmark files does not work correctly
  • What are the unsupported features
  • What hardware you use for testing (so at least there's some baseline for users to see if they can benefit/use from the change. I bet you can nail it down to terms like vliw4+, GCN+)

While OpenCL is officially only supported on Windows+AMD we still need to know bare minimum of state of the things before adding support of something.

Edward O'Callaghan (funfunctor) marked an inline comment as done.EditedAug 24 2016, 4:09 PM

When we add functionality to Blender, we need to give clear indication to users what they can/should expect from it and how/if they can benefit from it.
Here you can at least provide information like:

  • Which benchmark files work correctly

Will become clear once the next blender is closer to being cut as stable since, as I stated before, the backend components I am working on such as LLVM are in flux. However without handling clover - as this patch does - the right CL kernel source is not passed down to our stack to be JIT'ed.

  • Which benchmark files does not work correctly

ditto

  • What are the unsupported features

ditto

  • What hardware you use for testing (so at least there's some baseline for users to see if they can benefit/use from the change. I bet you can nail it down to terms like vliw4+, GCN+)

I did. Further, the ISA is actually irrelevant here as this adding support for clover specifically.

While OpenCL is officially only supported on Windows+AMD we still need to know bare minimum of state of the things before adding support of something.

Adding support as you state in the case of Windows vs. Linux isn't a all or nothing proposition. We have to start somewhere, first thing is to even generate a valid CL kernel for clover before talking about benchmarks!

I'm not talking about comparing speed, i'm talking about getting clear picture of what is supported and what is not supported.

Please understand that Blender is for artists and each and every feature is to be well-communicated in a language understandable by them. In this case we can't give them any clues whether this change will support their configuration or not. And even if their hardware is well supported by the driver and they use latest Mesa we still can't tell them which features they can expect to work correctly.

All we know here is:

  • GCN+ cards are expected to be supported (i'm guessing here, mainly based on mention of radeonsi driver).
  • Support is limited to a feature set of split kernel
  • Some features do not work correctly

Now, you have patched Blender, you have latest drivers and everything. it is really not that hard to get a benchmark archive, open .blend files, hit F12 and see if it's rendered correctly. You can lower number of samples to 16 to see that, so overall test will take 10min but then you'll have clear picture of what works and what does not and be able to communicate it to users.

Just to make it 100% clear: be mentioning benchmark files i do not mean doing any sort of speed comparison, but only use prepared .blend files with various features in them to get more clear picture of what the current status is.

I think this discussion boils down to the question:

Do we want to add support for a currently non working ( from a user perspective ) opencl implementation ?

I think @Sergey Sharybin (sergey) seems to trend towards the answer is that we should add support for implementation that actually work.

I would tend to agree, while I love open source and would like cycles to run on all opencl implementations ever created we are trying to make a tool for users to make great art. I think adding support for implementation that *currently* do not yield usable pictures for our users needs to be behind some kind of flag / toggle so not to confuse them and or raise expectations.

I'm not talking about comparing speed, i'm talking about getting clear picture of what is supported and what is not supported.

Understood.

Please understand that Blender is for artists and each and every feature is to be well-communicated in a language understandable by them. In this case we can't give them any clues whether this change will support their configuration or not. And even if their hardware is well supported by the driver and they use latest Mesa we still can't tell them which features they can expect to work correctly.

I can appreciate that.

All we know here is:

  • GCN+ cards are expected to be supported (i'm guessing here, mainly based on mention of radeonsi driver).
  • Support is limited to a feature set of split kernel
  • Some features do not work correctly

Correct.

Now, you have patched Blender, you have latest drivers and everything. it is really not that hard to get a benchmark archive, open .blend files, hit F12 and see if it's rendered correctly. You can lower number of samples to 16 to see that, so overall test will take 10min but then you'll have clear picture of what works and what does not and be able to communicate it to users.
Just to make it 100% clear: be mentioning benchmark files i do not mean doing any sort of speed comparison, but only use prepared .blend files with various features in them to get more clear picture of what the current status is.

What would be ideal is a validation suite if you would like to have a check list style is/is-not supported report like we do with piglit for example in Mesa for GL support.

At the moment this is to have blender generate valid CL kernel's *at all* for clover! Further work is still needed on stabilizing issues in the pass manger inside LLVM as I said.

In simply language this all means:

From Blenders POV: Mesa/Clover OpenCL is not supported currently in Blender, however we now provide the required code in Blender to hand Mesa/Clover valid CL. Blender currently recommends using the vendor driver for best results unless you are a developer/tester.

From our (Mesa/LLVM) POV: Blender has enough support that we can fix our own driver problems as we continue to test and find problems (since this is initial support obviously).

I think this discussion boils down to the question:
Do we want to add support for a currently non working ( from a user perspective ) opencl implementation ?
I think @Sergey Sharybin (sergey) seems to trend towards the answer is that we should add support for implementation that actually work.
I would tend to agree, while I love open source and would like cycles to run on all opencl implementations ever created we are trying to make a tool for users to make great art. I think adding support for implementation that *currently* do not yield usable pictures for our users needs to be behind some kind of flag / toggle so not to confuse them and or raise expectations.

Understood totally, problem is a bit of a chicken and egg one though. My take would be to add something in the release notes..

I should add one thing here is that this change makes zero user visible changes! Before this change the user is still able to select the CL devices but they just get compile errors of the CL kernel whereas with this change they _may_ get miss-compiles with certain CL kernel combinations or visual artifacts. So it just makes things go from not working at all when the user clicks on the CL device to "sort of works for the basics".

Understood totally, problem is a bit of a chicken and egg one though. My take would be to add something in the release notes..

Or hide this behind the experimental flag or some environment variable. There are many possibilities. I do not really believe the chicken egg thing here. While blender might important it is no where near crucial to making clover a good compliment OpenCl implementation

Understood totally, problem is a bit of a chicken and egg one though. My take would be to add something in the release notes..

Or hide this behind the experimental flag or some environment variable. There are many possibilities. I do not really believe the chicken egg thing here. While blender might important it is no where near crucial to making clover a good compliment OpenCl implementation

Its already hidden behind 'CYCLES_OPENCL_TEST=ALL' && 'CYCLES_OPENCL_SPLIT_KERNEL_TEST=1' so yea.

Just note that just because Clover meets the OpenCL minimum requirements of Blender (which it does) does not mean it will work flawlessly. I am not comfortable enough to claim "officially supporting" any Cycles features at this stage. Merely that Blender does now produces actually valid CL for which we can fix bugs in our driver that either we see or is reported to us naturally.

@Martijn Berger (juicyfruit), for OpenCL we use whitelist and currently it is only Apple and AMD Accelerated Parallel Processing being displayed by default, so there is no real need to hide something behind Experimental option. But it is still important to communicate limitations and expectations from the change. That's the only thing.

P.S. Only thing which will concern me a lot is if every simple file (like default cube scene) fails to render.

@Martijn Berger (juicyfruit), for OpenCL we use whitelist and currently it is only Apple and AMD Accelerated Parallel Processing being displayed by default, so there is no real need to hide something behind Experimental option. But it is still important to communicate limitations and expectations from the change. That's the only thing.
P.S. Only thing which will concern me a lot is if every simple file (like default cube scene) fails to render.

Sure, can render the default cube on a rx480 today but you need llvm >3.9.

What would be ideal is a validation suite if you would like to have a check list style is/is-not supported report like we do with piglit for example in Mesa for GL support.

We have some coverage with ctest, but that currently only works for CPU. We can try extending that to do GPU rendering as well. That would give some ballpark figures, but will also cause some false-positive failures because SSS/Volume is not supported in split kernel yet.

Sure, can render the default cube on a rx480 today but you need llvm >3.9.

That is a good crucial detail :)


As for the validation suite thingie. We an simply have gdoc spreadsheet or wiki page with list of all files from [1] and [2] with pass/fail flag next to it.

[1] https://svn.blender.org/svnroot/bf-blender/trunk/lib/tests/cycles/
[2] https://svn.blender.org/svnroot/bf-blender/trunk/lib/benchmarks/cycles/

What would be ideal is a validation suite if you would like to have a check list style is/is-not supported report like we do with piglit for example in Mesa for GL support.

We have some coverage with ctest, but that currently only works for CPU. We can try extending that to do GPU rendering as well. That would give some ballpark figures, but will also cause some false-positive failures because SSS/Volume is not supported in split kernel yet.

Good to know, have a cmake 'make test' target is quite useful for fixing bugs for us driver folks. Reduced test cases are ideal, however I suspect that is going to be up to me to derive given how well this patch is going.

Sure, can render the default cube on a rx480 today but you need llvm >3.9.

That is a good crucial detail :)

Not really, because you really need the later llvm for the rx480 support any way and 3.9 is pending release. Further, newer llvm's just means better perf and less bugs (like any software). Additionally, this patch is adding support for Clover not radeonsi exclusively, Clover is the frontend, radeonsi is the backend codegen and only radeonsi uses LLVM to do its code generation out of the Gallium drivers. Regardless its out of scope here.

To reiterate once again that this:

a.) Does not make any user visible changes!
b.) Does make the CL kernel valid source for the clover CL compiler frontend.

So all we have done here is to fix the preprocessor brokenness in Blender in the case of Clover.

You are bring up points about how good our CL compiler is and if it has any bugs, of course it has bugs however that just as well applies to any other vendors CL compiler! First Blender needs to generate *valid* CL before we can move forward.

I don't really understand such resistance to fixing an actual bug in Blender and instead leaving Blender broken to "workaround" a "potentially broken in some way" CL compiler. Critically - Its not like this patch introduces a new menu option !


As for the validation suite thingie. We an simply have gdoc spreadsheet or wiki page with list of all files from [1] and [2] with pass/fail flag next to it.
[1] https://svn.blender.org/svnroot/bf-blender/trunk/lib/tests/cycles/
[2] https://svn.blender.org/svnroot/bf-blender/trunk/lib/benchmarks/cycles/

Kind Regards,
Edward.

There is no resistance. We just need to know the support level and what users might/should expect from the changes. Even if the option is hidden behind environment variable check users will try using newly added Clover and if they encounter something they will report issue to us, so it is we who would need to deal with all the incoming frustration. Now, if we have a clear commit log or wiki page or whatever where we explicitly state support level that would (a) help us communicate this to users (b) reduce frustration level of Blender users.

So now when we approximately know supported hardware (at least currently supported) and driver version etc required the only crucial bit of information is what scenes can be currently rendered:

  • Does default cube render?
  • What shaders are rendered correctly?
  • What features are rendered correctly (hair, deformation motion blur, etc etc etc)?

it's really not hard to present your work on such a level.

There is no resistance. We just need to know the support level and what users might/should expect from the changes. Even if the option is hidden behind environment variable check users will try using newly added Clover and if they encounter something they will report issue to us, so it is we who would need to deal with all the incoming frustration. Now, if we have a clear commit log or wiki page or whatever where we explicitly state support level that would (a) help us communicate this to users (b) reduce frustration level of Blender users.

As I said many times now, this adds "support" to (( handle the case of Clover )), whether or not the backend hardware driver of Clover works well remains to be seen.

Actually, critically, nowhere in the commit message did I claim/use the word "support" specifically.

Finally, it is in actual fact us driver vendors that have to put up with the "frustration level of Blender users" because your application produces completely invalid CL code because you seem to not understand how you used the pre-processor. We are the ones that get the bug reports from the brain damage,
https://bugs.freedesktop.org/show_bug.cgi?id=95265
and we wast our time debugging your application for you only to wast ample volumes of our time explaining trivial patches to you. Rest assured we wont be pursuing it any further and will close further bug reports with application broken and direct the "frustration" squarely back at you.

So now when we approximately know supported hardware (at least currently supported) and driver version etc required the only crucial bit of information is what scenes can be currently rendered:

  • Does default cube render?
  • What shaders are rendered correctly?
  • What features are rendered correctly (hair, deformation motion blur, etc etc etc)?

it's really not hard to present your work on such a level.

I already answered these questions a few times now. If you are looking for a user facing check list I already said there wont be one right now since this change has nothing to do with that.

The solution to your dilemma is to not even claim support at all for the moment until more testing is done by us developers ! I think your getting a bit carried away with the word "support" and not understanding how Clover fits in here. Clover isn't a driver as such, to give you a conceptual idea Clover is like Clang (purely the parser of OpenCL) where radeonsi is like LLVM (backend codegen part).

I stumbled on this diff while trying to make my AMD Radeon RX Vega M work with Blender 2.80 / 2.81 alpha on Linux (T68009).

Parts of this diff have been superseded by @Brecht Van Lommel (brecht)'s changes in:
https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/e17f7af0ce7e045e287b517f775a282a7d7cc8c1
https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/9873005ecd7c21e3a6d371833a64b3ce722a48ea
https://git.blender.org/gitweb/gitweb.cgi/blender.git/commitdiff/dff88a92a470181c8ae82d1266eb610fca4521e0

__KERNEL_ADV_SHADING__ seems always on now, so unnecessary.
__CL_USE_NATIVE__ was removed, using the same definitions for all platforms. The original code made it so that __CL_USE_NATIVE__ would *prevent* using the native_* definitions (confusing name), which is not what the current code does, but restoring this causes a compilation issue with Clover due to recip() redefinition.
__KERNEL_SHADING__ is replaced by !__KERNEL_AO_PREVIEW__, which is enabled conditionally.

I still had a block build issue with Clover though, namely:

source/kernel/svm/svm_math_util.h:120:1: error: OpenCL C version 1.1 does not support the 'static' storage class specifier
source/kernel/kernel_compat_opencl.h:39:29: note: expanded from macro 'ccl_static_constant'

So I wrote this hack, which seems to work it around:

diff --git a/intern/cycles/device/opencl/opencl_split.cpp b/intern/cycles/device/opencl/opencl_split.cpp
index 442b92100bb..db1fa5f2d76 100644
--- a/intern/cycles/device/opencl/opencl_split.cpp
+++ b/intern/cycles/device/opencl/opencl_split.cpp
@@ -1885,6 +1885,9 @@ string OpenCLDevice::kernel_build_options(const string *debug_src)
   else if (platform_name == "AMD Accelerated Parallel Processing")
     build_options += "-D__KERNEL_OPENCL_AMD__ ";
 
+  else if (platform_name == "Clover")
+    build_options += "-D__KERNEL_OPENCL_CLOVER__ ";
+
   else if (platform_name == "Intel(R) OpenCL") {
     build_options += "-D__KERNEL_OPENCL_INTEL_CPU__ ";
 
diff --git a/intern/cycles/kernel/kernel_compat_opencl.h b/intern/cycles/kernel/kernel_compat_opencl.h
index e040ea88d7c..540af168bdc 100644
--- a/intern/cycles/kernel/kernel_compat_opencl.h
+++ b/intern/cycles/kernel/kernel_compat_opencl.h
@@ -36,7 +36,12 @@
 #define ccl_device_forceinline ccl_device
 #define ccl_device_noinline ccl_device ccl_noinline
 #define ccl_may_alias
+/* Clover uses OpenCL C 1.1 which doesn't support static */
+#if defined(__KERNEL_OPENCL_CLOVER__)
+#define ccl_static_constant __constant
+#else
 #define ccl_static_constant static __constant
+#endif
 #define ccl_constant __constant
 #define ccl_global __global
 #define ccl_local __local

Finally, to be able to test anything, I had to force Blender to accept my GPU, since its renderer string does not match the currently whitelisted devices in get_usable_devices():

diff --git a/intern/cycles/device/opencl/opencl.h b/intern/cycles/device/opencl/opencl.h
index 82b961b8de7..04f07f30365 100644
--- a/intern/cycles/device/opencl/opencl.h
+++ b/intern/cycles/device/opencl/opencl.h
@@ -90,7 +90,7 @@ class OpenCLInfo {
   static bool device_version_check(cl_device_id device, string *error = NULL);
   static string get_hardware_id(const string &platform_name, cl_device_id device_id);
   static void get_usable_devices(vector<OpenCLPlatformDevice> *usable_devices,
-                                 bool force_all = false);
+                                 bool force_all = true);
 
   /* ** Some handy shortcuts to low level cl*GetInfo() functions. ** */

There used to be a custom environment variable to allow using non-whitelisted GPUs as per https://developer.blender.org/D2171#50692, but it's not working anymore in Blender 2.80.

With the above 2 diffs, I could use my Vega M dGPU with Clover (Mesa 19.1.3 BTW).

Starting a Cycles render of the default cube with default samples, it went through the build of all Cycles kernels and cached them without any issue.

But then any attempt at actually rendering the image seemed to be stuck on "Waiting for availability of base" (waited 5 minutes before aborting - which left the GPU is a wrong state, reboot needed to fix it):

I0801 13:00:03.575182 18723 util_task.cpp:329] Creating pool of 8 threads.
I0801 13:00:03.575222 18723 util_task.cpp:241] Detected 8 processors in active group.
I0801 13:00:03.575228 18723 util_task.cpp:251] Not setting thread group affinity.
I0801 13:00:03.575557 18723 opencl_split.cpp:632] Creating new Cycles device for OpenCL platform Clover, device AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0).
I0801 13:00:05.221135 18723 util_task.cpp:347] De-initializing thread pool of task scheduler.
I0801 13:00:05.221266 18723 util_task.cpp:329] Creating pool of 8 threads.
I0801 13:00:05.221276 18723 util_task.cpp:241] Detected 8 processors in active group.
I0801 13:00:05.221279 18723 util_task.cpp:251] Not setting thread group affinity.
I0801 13:00:05.221406 18723 opencl_split.cpp:632] Creating new Cycles device for OpenCL platform Clover, device AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0).
I0801 13:00:05.222925 18758 session.cpp:753] Requested features:
Experimental features: Off
Max nodes group: 0
Nodes features: 0
Use Hair: False
Use Object Motion: False
Use Camera Motion: False
Use Baking: False
Use Subsurface: False
Use Volume: False
Use Branched Integrator: False
Use Patch Evaluation: False
Use Transparent Shadows: False
Use Principled BSDF: True
Use Denoising: False
Use Displacement: False
Use Background Light: True
I0801 13:00:05.222944 18758 opencl_split.cpp:761] Loading kernels for platform Clover, device AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0).
I0801 13:00:05.223009 18758 opencl_util.cpp:297] OpenCL program base not found in cache.
I0801 13:00:05.259636 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ '.
I0801 13:00:05.259654 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_base_659158CC2B0506862BCE80A8D3869244_66F686122DEAFAABB0CCA9DBEB37273C.clbin.
I0801 13:00:05.259692 18758 opencl_util.cpp:297] OpenCL program background not found in cache.
I0801 13:00:05.328404 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_TRANSPARENT__ -D__NO_SHADOW_TRICKS__ -D__NO_DENOISING__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.328428 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_background_86B199785CCC7D70A254E4777FEC095A_13AC40ED2FE033D81EB1496FB3C61F4D.clbin.
I0801 13:00:05.328727 18758 opencl_util.cpp:297] OpenCL program split_subsurface_scatter not found in cache.
I0801 13:00:05.396940 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.396962 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_subsurface_scatter_E9DF69343FB62854D83A9A779155F1C6_AFC1C358A7102AAC941EBA6B743D9D40.clbin.
I0801 13:00:05.397176 18758 opencl_util.cpp:297] OpenCL program split_shadow_blocked_dl not found in cache.
I0801 13:00:05.467972 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.467996 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_shadow_blocked_dl_E9DF69343FB62854D83A9A779155F1C6_499AD6A20A04F6023F358A7E8353D294.clbin.
I0801 13:00:05.468143 18758 opencl_util.cpp:297] OpenCL program split_shadow_blocked_ao not found in cache.
I0801 13:00:05.524817 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.524839 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_shadow_blocked_ao_E9DF69343FB62854D83A9A779155F1C6_7B4E802380CA0F712787B3A46F81F936.clbin.
I0801 13:00:05.525000 18758 opencl_util.cpp:297] OpenCL program split_holdout_emission_blurring_pathtermination_ao not found in cache.
I0801 13:00:05.580098 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.580117 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_holdout_emission_blurring_pathtermination_ao_E9DF69343FB62854D83A9A779155F1C6_00A56D67F66EB2A8DAA826B34CD1482C.clbin.
I0801 13:00:05.580325 18758 opencl_util.cpp:297] OpenCL program split_lamp_emission not found in cache.
I0801 13:00:05.631505 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.631531 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_lamp_emission_E9DF69343FB62854D83A9A779155F1C6_FCD3357FEDE799BADC2C13AAF136F5F1.clbin.
I0801 13:00:05.631747 18758 opencl_util.cpp:297] OpenCL program split_direct_lighting not found in cache.
I0801 13:00:05.683390 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.683410 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_direct_lighting_E9DF69343FB62854D83A9A779155F1C6_3AAB79B87E77ABDD79C524EB66211A09.clbin.
I0801 13:00:05.683615 18758 opencl_util.cpp:297] OpenCL program split_indirect_background not found in cache.
I0801 13:00:05.733995 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.734020 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_indirect_background_E9DF69343FB62854D83A9A779155F1C6_04E625B1F1B29141882922F074D1FB86.clbin.
I0801 13:00:05.734221 18758 opencl_util.cpp:297] OpenCL program split_shader_eval not found in cache.
I0801 13:00:05.784528 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.784552 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_shader_eval_E9DF69343FB62854D83A9A779155F1C6_53D7EBC487201B23107DDBC50D2FB2D7.clbin.
I0801 13:00:05.784770 18758 opencl_util.cpp:297] OpenCL program split_bundle not found in cache.
I0801 13:00:05.839252 18758 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_OPENCL_CLOVER__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__'.
I0801 13:00:05.839273 18758 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_split_bundle_E9DF69343FB62854D83A9A779155F1C6_AD8628FAF1CD777295A0BF20DADFFE13.clbin.
I0801 13:00:05.839458 18758 session.cpp:766] Total time spent loading kernels: 0.616542
I0801 13:00:05.839485 18758 svm.cpp:79] Total 6 shaders.
I0801 13:00:05.839526 18758 constant_fold.cpp:132] Discarding closure emission.
I0801 13:00:05.839542 18758 svm.cpp:64] Compilation summary:
Shader name: default_light
Number of SVM nodes: 3
Peak stack usage:    0
Time (in seconds):
Finalize:            0.000014
  Surface:           0.000004
  Bump:              0.000000
  Volume:            0.000000
  Displacement:      0.000001
Generate:            0.000005
Total:               0.000020
I0801 13:00:05.839581 18754 svm.cpp:64] Compilation summary:
Shader name: default_surface
Number of SVM nodes: 8
Peak stack usage:    4
Time (in seconds):
Finalize:            0.000029
  Surface:           0.000012
  Bump:              0.000000
  Volume:            0.000001
  Displacement:      0.000002
Generate:            0.000015
Total:               0.000044
I0801 13:00:05.839586 18751 svm.cpp:64] Compilation summary:
Shader name: default_empty
Number of SVM nodes: 3
Peak stack usage:    0
Time (in seconds):
Finalize:            0.000012
  Surface:           0.000000
  Bump:              0.000000
  Volume:            0.000004
  Displacement:      0.000004
Generate:            0.000008
Total:               0.000022
I0801 13:00:05.839591 18755 svm.cpp:64] Compilation summary:
Shader name: default_background
Number of SVM nodes: 5
Peak stack usage:    0
Time (in seconds):
Finalize:            0.000020
  Surface:           0.000005
  Bump:              0.000000
  Volume:            0.000005
  Displacement:      0.000003
Generate:            0.000013
Total:               0.000033
I0801 13:00:05.839591 18749 svm.cpp:64] Compilation summary:
Shader name: shader
Number of SVM nodes: 5
Peak stack usage:    0
Time (in seconds):
Finalize:            0.000017
  Surface:           0.000004
  Bump:              0.000000
  Volume:            0.000004
  Displacement:      0.000003
Generate:            0.000011
Total:               0.000029
I0801 13:00:05.839617 18752 svm.cpp:64] Compilation summary:
Shader name: Material
Number of SVM nodes: 28
Peak stack usage:    23
Time (in seconds):
Finalize:            0.000001
  Surface:           0.000044
  Bump:              0.000000
  Volume:            0.000000
  Displacement:      0.000001
Generate:            0.000045
Total:               0.000046
I0801 13:00:05.839681 18758 opencl_split.cpp:1148] Texture allocate: __svm_nodes, 928 bytes. (928)
I0801 13:00:05.839692 18758 opencl_split.cpp:1148] Texture allocate: __shaders, 192 bytes. (192)
I0801 13:00:05.839716 18758 svm.cpp:159] Shader manager updated 6 shaders in 0.000230074 seconds.
I0801 13:00:05.839741 18758 object.cpp:632] Total 1 objects.
I0801 13:00:05.839751 18758 opencl_split.cpp:1148] Texture allocate: __objects, 176 bytes. (176)
I0801 13:00:05.839767 18758 particles.cpp:108] Total 0 particle systems.
I0801 13:00:05.839776 18758 opencl_split.cpp:1148] Texture allocate: __particles, 80 bytes. (80)
I0801 13:00:05.839784 18758 mesh.cpp:2121] Total 1 meshes.
I0801 13:00:05.839798 18758 opencl_split.cpp:1148] Texture allocate: __attributes_map, 96 bytes. (96)
I0801 13:00:05.839807 18758 opencl_split.cpp:1148] Texture allocate: __attributes_float3, 128 bytes. (128)
I0801 13:00:05.839818 18758 mesh.cpp:2253] Objects BVH build pool statistics:
Total time:    0.000003
Tasks handled: 1
I0801 13:00:05.839828 18758 mesh.cpp:1911] Using BVH2 layout.
I0801 13:00:05.839865 18758 bvh_build.cpp:484] BVH build statistics:
  Build time: 3.00407e-05
  Total number of nodes: 5
  Number of inner nodes: 2
  Number of leaf nodes: 3
  Number of unaligned nodes: 0
  Allocation slop factor: 1
  Maximum depth: 3
I0801 13:00:05.839890 18758 opencl_split.cpp:1148] Texture allocate: __bvh_nodes, 128 bytes. (128)
I0801 13:00:05.839895 18758 opencl_split.cpp:1148] Texture allocate: __bvh_leaf_nodes, 48 bytes. (48)
I0801 13:00:05.839900 18758 opencl_split.cpp:1148] Texture allocate: __object_node, 4 bytes. (4)
I0801 13:00:05.839906 18758 opencl_split.cpp:1148] Texture allocate: __prim_tri_index, 48 bytes. (48)
I0801 13:00:05.839911 18758 opencl_split.cpp:1148] Texture allocate: __prim_tri_verts, 576 bytes. (576)
I0801 13:00:05.839916 18758 opencl_split.cpp:1148] Texture allocate: __prim_type, 48 bytes. (48)
I0801 13:00:05.839922 18758 opencl_split.cpp:1148] Texture allocate: __prim_visibility, 48 bytes. (48)
I0801 13:00:05.839927 18758 opencl_split.cpp:1148] Texture allocate: __prim_index, 48 bytes. (48)
I0801 13:00:05.839932 18758 opencl_split.cpp:1148] Texture allocate: __prim_object, 48 bytes. (48)
I0801 13:00:05.839946 18758 opencl_split.cpp:1148] Texture allocate: __tri_shader, 48 bytes. (48)
I0801 13:00:05.839951 18758 opencl_split.cpp:1148] Texture allocate: __tri_vnormal, 128 bytes. (128)
I0801 13:00:05.839957 18758 opencl_split.cpp:1148] Texture allocate: __tri_vindex, 192 bytes. (192)
I0801 13:00:05.839962 18758 opencl_split.cpp:1148] Texture allocate: __tri_patch, 48 bytes. (48)
I0801 13:00:05.839967 18758 opencl_split.cpp:1148] Texture allocate: __tri_patch_uv, 64 bytes. (64)
I0801 13:00:05.839977 18758 opencl_split.cpp:1148] Texture allocate: __object_flag, 4 bytes. (4)
I0801 13:00:05.839987 18758 camera.cpp:509] Camera is outside of the volume.
I0801 13:00:05.839994 18758 tables.cpp:42] Total 1 lookup tables.
I0801 13:00:05.839998 18758 opencl_split.cpp:1148] Texture allocate: __lookup_table, 262,144 bytes. (256.00K)
I0801 13:00:05.840008 18758 light.cpp:901] Total 2 lights.
I0801 13:00:05.840013 18758 light.cpp:226] Background MIS has been disabled.
I0801 13:00:05.840018 18758 light.cpp:886] Number of lights sent to the device: 1
I0801 13:00:05.840021 18758 light.cpp:888] Number of lights without contribution: 1
I0801 13:00:05.840025 18758 opencl_split.cpp:1148] Texture allocate: __lights, 192 bytes. (192)
I0801 13:00:05.840034 18758 light.cpp:307] Total 1 of light distribution primitives.
I0801 13:00:05.840039 18758 opencl_split.cpp:1148] Texture allocate: __light_distribution, 32 bytes. (32)
I0801 13:00:05.842689 18758 opencl_split.cpp:1148] Texture allocate: __sobol_directions, 1,335,552 bytes. (1.27M)
I0801 13:00:05.842778 18758 tables.cpp:42] Total 2 lookup tables.
I0801 13:00:05.842784 18758 opencl_split.cpp:1148] Texture allocate: __lookup_table, 266,240 bytes. (260.00K)
I0801 13:00:05.842800 18758 opencl_split.cpp:922] Buffer allocate: __data, 1,456 bytes. (1.42K)
I0801 13:00:05.842840 18758 scene.cpp:306] System memory statistics after full device sync:
  Usage: 1,894,540 (1.81M)
  Peak: 3,010,924 (2.87M)
I0801 13:00:05.843082 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 1,888 bytes. (1.84K)
I0801 13:00:05.843122 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 928 bytes. (928)
I0801 13:00:05.843140 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 400 bytes. (400)
I0801 13:00:05.843149 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 304 bytes. (304)
I0801 13:00:05.843156 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 368 bytes. (368)
I0801 13:00:05.843164 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 266,432 bytes. (260.19K)
I0801 13:00:05.843616 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 256 bytes. (256)
I0801 13:00:05.843636 18757 opencl_split.cpp:922] Buffer allocate: memory manager buffer, 1,335,792 bytes. (1.27M)
I0801 13:00:05.845316 18757 opencl_split.cpp:922] Buffer allocate: kernel_globals, 1,544 bytes. (1.51K)
I0801 13:00:05.845338 18757 opencl_split.cpp:922] Buffer allocate: RenderBuffers, 131,072 bytes. (128.00K)
I0801 13:00:05.845345 18757 opencl_util.cpp:297] Waiting for availability of base.
I0801 13:00:05.845978 18757 opencl_split.cpp:555] Maximum device allocation size: 3,435,973,836 bytes. (3.20G).
I0801 13:00:05.845988 18757 opencl_split.cpp:922] Buffer allocate: size_buffer, 8 bytes. (8)
I0801 13:00:05.845993 18757 opencl_util.cpp:297] Waiting for availability of base.
I0801 13:00:05.850349 18757 device_split_kernel.cpp:131] Split state element size: 2,500 bytes. (2.44K).
I0801 13:00:05.850383 18757 opencl_split.cpp:564] Global size: (768, 828).
I0801 13:00:05.850394 18757 opencl_split.cpp:922] Buffer allocate: work_pool_wgs, 39,748 bytes. (38.82K)
I0801 13:00:05.850410 18757 opencl_split.cpp:922] Buffer allocate: queue_index, 36 bytes. (36)
I0801 13:00:05.850427 18757 opencl_split.cpp:922] Buffer allocate: use_queues_flag, 1 bytes. (1)
I0801 13:00:05.850441 18757 opencl_split.cpp:922] Buffer allocate: size_buffer, 8 bytes. (8)
I0801 13:00:05.850455 18757 opencl_util.cpp:297] Waiting for availability of base.
I0801 13:00:05.851478 18757 opencl_split.cpp:922] Buffer allocate: split_data, 1,589,760,000 bytes. (1.48G)
I0801 13:00:05.851524 18757 opencl_util.cpp:297] Waiting for availability of base.
I0801 13:00:05.851577 18757 opencl_util.cpp:297] Waiting for availability of base.
I0801 13:00:05.857985 18757 opencl_split.cpp:922] Buffer allocate: ray_state, 635,904 bytes. (621.00K)
I0801 13:00:05.858017 18757 opencl_util.cpp:297] Waiting for availability of base.

I hope this helps bring us closer to support Clover in Blender out of the box.

Regarding the above discussion, I do agree that making Blender *able* to use Clover without having to hack its source would be great so that Mesa developers can then work on their end of the problem.
Re-adding support for an environment variable or a command line switch that enables using non-whitelisted devices would be great.

I've just tried this with llvm 10 and mesa git. Now it does compile the kernels for the default cube scene in about 30 sec, so that's good. But it locks up the system completely when trying to render.
With llvm 8 and 19.1.3 it also locks up my system.