Page MenuHome

Cycles: Enable OpenCL 2.0 when available
Changes PlannedPublic

Authored by Jeroen Bakker (jbakker) on Nov 14 2019, 8:53 AM.

Details

Summary

The OpenCL 2.0 compiler of AMD improves the render time, even without using
an OpenCL 2.0 feature. Additional performance can be added later when
using specific OpenCL 2.0 features.

Performance impact in seconds Linux AMD Vega 64

**Master**

bmw27: 89
barbershop_interior: 673
classroom: 173
fishy_cat: 196
koro: 207
pavillon_barcelona: 414

**Patch Applied**

bmw27: 84
barbershop_interior: 497
classroom: 153
fishy_cat: 190
koro: 202
pavillon_barcelona: 417

Diff Detail

Repository
rB Blender
Branch
T71479 (branched from master)
Build Status
Buildable 5982
Build 5982: arc lint + arc unit

Event Timeline

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 8:53 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 9:59 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Added comment

intern/cycles/device/opencl/opencl_split.cpp
1870

we don't support opencl 1.1 so perhaps just remove this

Reverted disabling out of process compilation

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 1:02 PM
Jeroen Bakker (jbakker) marked an inline comment as not done.

Can I ask if you're using ROCm to test this?

I'm thinking particularly of this article: https://www.phoronix.com/scan.php?page=news_item&px=AMD-OpenCL-2.0-ROCm-2.0-Work

I've been testing Blender with a Radeon VII on Linux and I've had disappointing performance with amdgpu-pro. (And few distros supported as well)

No I don’t use ROCm in my default test set as it is a wrapper and added more complexity to the landscape. There is a small user base using it. I don’t have a Radeon 7 to test myself.

It would be helpful to test this patch and send some comparison data before and after. So we can keep track of the current state

It would be helpful to test this patch and send some comparison data before and after. So we can keep track of the current state

I'm happy to do that! Although the current state I had when I tested it about a week ago was waiting and around an hour for Cycles kernels to compile for the default cube and giving up, so I'm not sure that's the best data. I may not have set it up properly, or perhaps Ubuntu 19.10 made it unhappy.

In any case I'm happy to create whatever environment you'd find useful and test it there before and after this patch. The machine I have access to has 32GB of RAM and a R7 2700x.

@Hans Goudey (HooglyBoogly) Please upload your system-info.txt. By default the OpenCL drivers installed by ROCm are not mature enough for running cycles at all... They are optimized for micro kernels, cycles isn't.

Here's that. Yeah I guess the whole situation isn't ideal, but I'm curious to see how it works out.

Here's where it seems to get stuck compiling the kernels (on master):

Cycles: compiling OpenCL program split_subsurface_scatter...
Cycles: compiling OpenCL program split_do_volume...
Cycles: compiling OpenCL program split_shadow_blocked_dl...
Cycles: compiling OpenCL program split_shadow_blocked_ao...
Cycles: compiling OpenCL program split_holdout_emission_blurring_pathtermination_ao...
Cycles: compiling OpenCL program split_lamp_emission...
Cycles: compiling OpenCL program split_direct_lighting...
Cycles: compiling OpenCL program split_indirect_background...
Cycles: compiling OpenCL program split_shader_eval...
Cycles: compiling OpenCL program split_bundle...
Cycles: compiling OpenCL program base...
Cycles: compiling OpenCL program background...
Kernel compilation of base finished in 7.31s.
Kernel compilation of split_do_volume finished in 7.76s.
Kernel compilation of split_indirect_background finished in 8.19s.
Kernel compilation of split_holdout_emission_blurring_pathtermination_ao finished in 8.35s.

I'm happy to run more tests and I'll try with this patch on Monday.

@Hans Goudey (HooglyBoogly)
Sorry, you Driver isn't mature enough. It uses the X11 implementation and that implementation is just not able to run cycles with acceptable performance (compile and execution wise). I wished companies would invest in OpenSource drivers for these kind of technologies.

@Jeroen Bakker (jbakker) Interesting. I was under the impression that AMD was working ROCm as an open source answer to CUDA and a replacement to their proprietary OpenCL driver stack. I also don't have a great understanding of this landscape and I haven't found much material out there about it.

You mentioned there was a small user base using ROCm. Do you have any information about that or about your test set up?

If you're still interested in tests with the Radeon VII, like I said I'm happy to install whichever driver you recommend on whatever distribution to run some tests.

Brecht Van Lommel (brecht) added inline comments.
intern/cycles/device/opencl/opencl_split.cpp
1862

Why use this NVIDIA specific code for all platforms?

1865

As far as I know noinline is not an OpenCL 2.0 feature, but rather something up to the compiler that can work with any OpenCL versions.

Is it OpenCL 2.0 or __CL_NOINLINE__ that makes the biggest difference for performance? And memory usage?

I'd expect noinline to use less memory if it works.

intern/cycles/device/opencl/opencl_split.cpp
1865

You're right, noinline isn't from the OpenCL 2.0 specs. I enabled it only for OpenCL2.0 as currently it was disabled for all OpenCL platforms, and haven't been able to find the reason. I detected that the heavier scenes needed both option to be on par with 2.80. I will split this patch into two separates so we can keep better track to the changes.

I did detect memory changes when enabling OpenCL2.0 the compiler needed much more memory (32GB when doing out of process compilation). OpenCL2.0 has more memory and work sync features we could utilize. We still need to do a lot of benchmark to find out what the actual benefits are.

Jeroen Bakker (jbakker) marked 2 inline comments as done.
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Cleaned up the patch

  • removed the inlining
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Wed, Nov 27, 8:49 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Wed, Nov 27, 8:51 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Only enables the OpenCL 2.0 context when set in the debug flags.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Thu, Dec 12, 8:53 AM
Jeroen Bakker (jbakker) planned changes to this revision.Thu, Dec 12, 8:55 AM
Jeroen Bakker (jbakker) added inline comments.
intern/cycles/blender/addon/properties.py
781

This is a scene setting. Move this option to preferences.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Thu, Dec 12, 9:29 AM