Cycles OpenCL kernel-splitting work #44197

Closed
opened 2015-03-30 22:09:23 +02:00 by Sergey Sharybin · 60 comments

This is a design task related on the Cycles kernel split patch (D1200) for the communication which are not strongly related on the actual code.

Current OpenCL state is quite limited feature-wise. There's no:

  • SSS
  • Volumes (both homogenous and heterougenous)
  • Motion blur
  • CMJ

Reports about crashes and render artifacts could happen here. Sharing benchmarks and test files which demonstrates issues is also possible here.

Please keep this thread constructive. Do not report issues with applying the patch or bulding blender. This kind of feedback is to happen via IRC.

Also do not put backtraces / logs as inlined into the comment, attach them as file instead.

All the non-constructive comments will be removed. Keep in mind it's not a forum, but a way to communicate with the patch developers.

This is a design task related on the Cycles kernel split patch ([D1200](https://archive.blender.org/developer/D1200)) for the communication which are not strongly related on the actual code. Current OpenCL state is quite limited feature-wise. There's no: - SSS - Volumes (both homogenous and heterougenous) - Motion blur - CMJ ---- Reports about crashes and render artifacts could happen here. Sharing benchmarks and test files which demonstrates issues is also possible here. Please keep this thread constructive. Do not report issues with applying the patch or bulding blender. This kind of feedback is to happen via IRC. Also do not put backtraces / logs as inlined into the comment, attach them as file instead. All the non-constructive comments will be removed. Keep in mind it's not a forum, but a way to communicate with the patch developers.
Author
Owner

Changed status to: 'Open'

Changed status to: 'Open'
Sergey Sharybin self-assigned this 2015-03-30 22:09:24 +02:00
Author
Owner

Added subscriber: @Sergey

Added subscriber: @Sergey

Added subscriber: @GeorgeKyriazis

Added subscriber: @GeorgeKyriazis

Added subscriber: @mib2berlin

Added subscriber: @mib2berlin

Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.

Hash: 30c689ff7

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57

cpu_crash.blend

cpu_crash.txt

Cheers, mib

Get repeatable crash with opencl on intel with simple file. Open .blend and start render with F12, crash before last tile finish. Hash: 30c689ff7 Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 349.12 opencl-1.2-5.0.0.57 [cpu_crash.blend](https://archive.blender.org/developer/F156554/cpu_crash.blend) [cpu_crash.txt](https://archive.blender.org/developer/F156555/cpu_crash.txt) Cheers, mib

In #44197#299196, @mib2berlin wrote:
Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.

Hash: 30c689ff7

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57

cpu_crash.blend

cpu_crash.txt

Cheers, mib

Intel has problems. We don't know what the issue is, but Intel does not work on any scenes. They have some issues compiling the kernels.

> In #44197#299196, @mib2berlin wrote: > Get repeatable crash with opencl on intel with simple file. > Open .blend and start render with F12, crash before last tile finish. > > Hash: 30c689ff7 > > Opensuse 13.2/64 > Intel i5 3770K > GTX 760 4 GB > Driver 349.12 > opencl-1.2-5.0.0.57 > > > [cpu_crash.blend](https://archive.blender.org/developer/F156554/cpu_crash.blend) > > [cpu_crash.txt](https://archive.blender.org/developer/F156555/cpu_crash.txt) > > Cheers, mib Intel has problems. We don't know what the issue is, but Intel does not work on any scenes. They have some issues compiling the kernels.

Added subscriber: @bliblubli

Added subscriber: @bliblubli

This comment was removed by @bliblubli

*This comment was removed by @bliblubli*

Added subscriber: @ideasman42

Added subscriber: @ideasman42

@bliblubli, This thread is not for you to make value judgment on our work and tell us what you think our priorities should be. This task was set up so users could report how well the patch works, or where it fails.

if you think it is a bad direction or would choose some other priorities for Cycles, this is not the place to say so.

Comments like this are not appreciated and will be deleted in future.

@bliblubli, This thread is not for you to make value judgment on our work and tell us what you think our priorities should be. This task was set up so users could report how well the patch works, or where it fails. if you think it is a bad direction or would choose some other priorities for Cycles, this is not the place to say so. Comments like this are not appreciated and will be deleted in future.

Added subscriber: @MarcClintDion

Added subscriber: @MarcClintDion

Added subscriber: @mont29

Added subscriber: @mont29
Member

Added subscriber: @lennyhpc

Added subscriber: @lennyhpc

On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. (@ideasman42 sorry if you understood it so, was meant as a debate, English is not my mother tong. I didn't want to tell you what to do.)

On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. (@ideasman42 sorry if you understood it so, was meant as a debate, English is not my mother tong. I didn't want to tell you what to do.)
Member

In #44197#299276, @bliblubli wrote:
On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages.

One should avoid using CPU OpenCL with split-kernel, most of the optimizations are only effective on GPU. Additional kernels and buffers also account to extra overhead.

Using CPU as OpenCL device, we observed the original mega-kernel crashes to the desktop randomly before or after the rendering. This appears on both Intel and AMD systems using blender 2.73/2.74 on Windows. Probably should report this bug somewhere else if it matters. When the CPU OpenCL does work, the performance however is always slower than native CPU.

In multi-device mode, it's doable to use mega-kernel for CPU OpenCL and split-kernel for GPU.

> In #44197#299276, @bliblubli wrote: > On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. One should avoid using CPU OpenCL with split-kernel, most of the optimizations are only effective on GPU. Additional kernels and buffers also account to extra overhead. Using CPU as OpenCL device, we observed the original mega-kernel crashes to the desktop randomly before or after the rendering. This appears on both Intel and AMD systems using blender 2.73/2.74 on Windows. Probably should report this bug somewhere else if it matters. When the CPU OpenCL does work, the performance however is always slower than native CPU. In multi-device mode, it's doable to use mega-kernel for CPU OpenCL and split-kernel for GPU.
Member

Added subscriber: @Lockal

Added subscriber: @Lockal
Member

I've tested this patch (the original one from D1200) on Ubuntu 14.10 x86-64 and NVidia GTX 690 with MikePan BMW scene.

Testing notes:

  • Sample count during rendering goes into negative direction
  • Tile preview doesn't work (viewport updates active tile only when all samples for specific tile are processed)
  • Big tiles doesn't work (256x256 works, 512x512 does not work, renders black image without any error message).
  • 2x mode (NVidia GTX 690 is a dual-GPU card, somewhere between Radeon HD 7990 and Radeon HD 7970 GHz) doesn't work (black image).
  • Progressive refinement does not work (gives 1-sample-quality image after each sample).
  • Tiny tiles (8x8) make blender freeze.

Performance on NVidia (modified BMW benchmark with 256x256 tiles: do not compare absolute values with results on your system):

OpenCL (1x) Cuda (1x) Cuda (2x)
04:27.48 02:33.55 01:20.70

Note for testers with NVidia GPUs:

  • Use at least 349.12 driver, earlier versions has no OpenCL 1.2 support.
  • Don't even try to build with Address Sanitizer - just linking any application to libasan makes icd loader to report, that platform is not supported.
I've tested this patch (the original one from [D1200](https://archive.blender.org/developer/D1200)) on Ubuntu 14.10 x86-64 and NVidia GTX 690 with MikePan BMW scene. Testing notes: * Sample count during rendering goes into negative direction * Tile preview doesn't work (viewport updates active tile only when all samples for specific tile are processed) * Big tiles doesn't work (256x256 works, 512x512 does not work, renders black image without any error message). * 2x mode (NVidia GTX 690 is a dual-GPU card, somewhere between Radeon HD 7990 and Radeon HD 7970 GHz) doesn't work (black image). * Progressive refinement does not work (gives 1-sample-quality image after each sample). * Tiny tiles (8x8) make blender freeze. Performance on NVidia (modified BMW benchmark with 256x256 tiles: do not compare absolute values with results on your system): | OpenCL (1x) | Cuda (1x) | Cuda (2x) | ------------------ | -------------- | -------------- | 04:27.48 | 02:33.55 | 01:20.70 Note for testers with NVidia GPUs: * Use at least 349.12 driver, earlier versions has no OpenCL 1.2 support. * Don't even try to build with Address Sanitizer - just linking any application to libasan makes icd loader to report, that platform is not supported.

Added subscriber: @thelasthope

Added subscriber: @thelasthope

I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12

I tried to render the standard cube.

Notes:

I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

  • Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
  • Compiling the Kernel works on my GPU (Console: "Device init sucess")
  • Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

I also tested the original patch form [D1200](https://archive.blender.org/developer/D1200) on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12 I tried to render the standard cube. Notes: I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering. - Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer - Compiling the Kernel works on my GPU (Console: "Device init sucess") - Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles. @GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

In #44197#300346, @thelasthope wrote:
I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12

I tried to render the standard cube.

Notes:

I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

  • Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
  • Compiling the Kernel works on my GPU (Console: "Device init sucess")
  • Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

We're still investigating feasibility. We can't make any commitments, though.

Our main focus is on future architectures and APIs. For example, if/when we add OpenCL 2.0 support, this won't be supported on pre-GCN, since pre-GCN HW cannot support OpenCL 2.0 features.

> In #44197#300346, @thelasthope wrote: > I also tested the original patch form [D1200](https://archive.blender.org/developer/D1200) on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12 > > I tried to render the standard cube. > > Notes: > > I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering. > > - Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer > - Compiling the Kernel works on my GPU (Console: "Device init sucess") > - Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles. > > @GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs? We're still investigating feasibility. We can't make any commitments, though. Our main focus is on future architectures and APIs. For example, if/when we add OpenCL 2.0 support, this won't be supported on pre-GCN, since pre-GCN HW cannot support OpenCL 2.0 features.

Hi OpenCL on Nvidia does not work after latest commits.

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12

Blender Hash 7d9412f

After start render it start compiling but after some kernel it goes in to endless loop.

cuda_endless_compile.txt

Cheers, mib

Hi OpenCL on Nvidia does not work after latest commits. Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 349.12 Blender Hash 7d9412f After start render it start compiling but after some kernel it goes in to endless loop. [cuda_endless_compile.txt](https://archive.blender.org/developer/F162998/cuda_endless_compile.txt) Cheers, mib

Added subscriber: @varunsundar08

Added subscriber: @varunsundar08

In #44197#303154, @mib2berlin wrote:
Hi OpenCL on Nvidia does not work after latest commits.

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12

Blender Hash 7d9412f

After start render it start compiling but after some kernel it goes in to endless loop.

cuda_endless_compile.txt

Cheers, mib

Hello mib. We have set up a Linux system with Nvidia. But we are not able to re-produce the error you reported.
The configuration is as follows,
Ubuntu 14.04, 64-bit
Intel i5 4670k
GTX 780Ti
Driver 346.46

> In #44197#303154, @mib2berlin wrote: > Hi OpenCL on Nvidia does not work after latest commits. > > Opensuse 13.2/64 > Intel i5 3770K > GTX 760 4 GB > Driver 349.12 > > Blender Hash 7d9412f > > After start render it start compiling but after some kernel it goes in to endless loop. > > [cuda_endless_compile.txt](https://archive.blender.org/developer/F162998/cuda_endless_compile.txt) > > Cheers, mib Hello mib. We have set up a Linux system with Nvidia. But we are not able to re-produce the error you reported. The configuration is as follows, Ubuntu 14.04, 64-bit Intel i5 4670k GTX 780Ti Driver 346.46

Thank you for looking into.
Iirc Sergey told me to use the latest Beta driver to get OpenCL 1.2 support.
I will try to downgrade my driver to 346.46 and report here.

Cheers, mib
EDIT: Test with 346.47 and 349.16 but does not work.
Delete Intel OpenCL installation but same result.

Thank you for looking into. Iirc Sergey told me to use the latest Beta driver to get OpenCL 1.2 support. I will try to downgrade my driver to 346.46 and report here. Cheers, mib EDIT: Test with 346.47 and 349.16 but does not work. Delete Intel OpenCL installation but same result.

Added subscriber: @ThomasBerglund

Added subscriber: @ThomasBerglund

Any chances of seeing support for the AMD FirePro D-series GPUs that are found in the Apple Mac Pro?
http://www.amd.com/en-us/solutions/professional/d-series

I have tried compiling and running the latest cycles_kernel_split branch, simply trying to render the default cube scene, but It fails to compile the OpenCL kernel.

Mac Pro (2013)
32GB ram
AMD Radeon HD FirePro D-700 6GB (2x) + Intel Xeon CPU E5-1680 v2 @ 3.00GHz

OS X 10.10.3 (build 14D136)
OpenCL 1.2 (Feb 27 2015 01:29:10)

Blender build: 2.74, hash 5ad79b8

$ blender.app/Contents/MacOS/blender --debug-cycles
found bundled python: blender-build/build_darwin/bin/blender.app/Contents/MacOS/../Resources/2.74/python
I0501 09:11:50.033089 1981256448 device_cuda.cpp:1062] CUEW initialization failed: Error opening the library
Device init succes
Device init succes
Device init succes
Compiling OpenCL kernel ...
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon HD - FirePro D700 Compute Engine) (err:-2)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

I have attached the CVMCompiler crash log:
CVMCompiler_2015-05-01-091334.crash

Anything else I can provide to help troubleshoot this? Any ideas about what is going wrong?

Any chances of seeing support for the **AMD FirePro D-series** GPUs that are found in the **Apple Mac Pro**? http://www.amd.com/en-us/solutions/professional/d-series I have tried compiling and running the latest **cycles_kernel_split branch**, simply trying to render the default cube scene, but It fails to compile the OpenCL kernel. Mac Pro (2013) 32GB ram AMD Radeon HD FirePro D-700 6GB (2x) + Intel Xeon CPU E5-1680 v2 @ 3.00GHz OS X 10.10.3 (build 14D136) OpenCL 1.2 (Feb 27 2015 01:29:10) Blender build: 2.74, hash 5ad79b8 ``` $ blender.app/Contents/MacOS/blender --debug-cycles found bundled python: blender-build/build_darwin/bin/blender.app/Contents/MacOS/../Resources/2.74/python I0501 09:11:50.033089 1981256448 device_cuda.cpp:1062] CUEW initialization failed: Error opening the library Device init succes Device init succes Device init succes Compiling OpenCL kernel ... OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517) OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon HD - FirePro D700 Compute Engine) (err:-2) OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: Error returned by cvms_element_build_from_source ``` I have attached the CVMCompiler crash log: [CVMCompiler_2015-05-01-091334.crash](https://archive.blender.org/developer/F168886/CVMCompiler_2015-05-01-091334.crash) Anything else I can provide to help troubleshoot this? Any ideas about what is going wrong?

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

Hi,
I compiled the latest cycles_kernel_split branch revision after your commits last night (Windows 7 x64, Geforce 540M) and I get an instant crash when I try to render with OpenCL, both F12 render and Viewport.

Console is showing this:

 Device init succes
 Compiling OpenCL kernel ...
 UNREACHABLE executed!
Hi, I compiled the latest cycles_kernel_split branch revision after your commits last night (Windows 7 x64, Geforce 540M) and I get an instant crash when I try to render with OpenCL, both F12 render and Viewport. Console is showing this: ``` Device init succes Compiling OpenCL kernel ... UNREACHABLE executed!

Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Hi, After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Again a new test with 5ad79b8 System: Win 7 64bit AMD Radeon HD 6950 2GB Tried to render the standard cube (F12). The compiling of the OpenCL Kernels didn't work anymore. Console: ``` Error E013: Insufficient Private Resources ``` I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB The latest build where the OpenCL Kernel was able to compile was f32fad9

In #44197#306232, @bliblubli wrote:
Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Hello bliblubli,
I investigated the viewport render delay . the kernel load does not take much time . the delay is actually because of the transparent shadows feature ( which is still broken on amd ) that is turned on. We will take care of it on further commits . the previous revision did not have transparent shadows feature enabled . the progress bar stays on "loading render kernels" as there is no update to the progress bar after load_kernels function ( will correct it asap) . thanks :)

> In #44197#306232, @bliblubli wrote: > Hi, > After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.) Hello bliblubli, I investigated the viewport render delay . the kernel load does not take much time . the delay is actually because of the transparent shadows feature ( which is still broken on amd ) that is turned on. We will take care of it on further commits . the previous revision did not have transparent shadows feature enabled . the progress bar stays on "loading render kernels" as there is no update to the progress bar after load_kernels function ( will correct it asap) . thanks :)

In #44197#306291, @thelasthope wrote:
Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Hello david,
I will look into it and get back asap . thanks :)

> In #44197#306291, @thelasthope wrote: > Again a new test with 5ad79b8 > > System: > Win 7 64bit > AMD Radeon HD 6950 2GB > > Tried to render the standard cube (F12). > The compiling of the OpenCL Kernels didn't work anymore. > > Console: > > ``` > Error E013: Insufficient Private Resources > ``` > I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB > > The latest build where the OpenCL Kernel was able to compile was f32fad9 Hello david, I will look into it and get back asap . thanks :)
Member

I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report.

I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report.

In #44197#306340, @Lockal wrote:
I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report.

Hello lockal,
I believe the error you reported is similar to the one that Thomas dinges mentioned earlier in the day . please let us know if cycles opencl on blender master branch works for you (on nvidia). sergey mentioned that cycles opencl on master does not work with nvidia since nvidia's driver update (if I remember correctly). thanks .

> In #44197#306340, @Lockal wrote: > I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report. Hello lockal, I believe the error you reported is similar to the one that Thomas dinges mentioned earlier in the day . please let us know if cycles opencl on blender master branch works for you (on nvidia). sergey mentioned that cycles opencl on master does not work with nvidia since nvidia's driver update (if I remember correctly). thanks .

Added subscriber: @Lapineige

Added subscriber: @Lapineige

Added subscriber: @flubba86

Added subscriber: @flubba86

In #44197#306291, @thelasthope wrote:
Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Im getting the same issue I think.
Im using the latest available code from the cycles_kernel_split branch, as at Sat May 9th.
The first time I tried to render, the progress bar did the "Loading render kernels (this may take a few minutes)" and after about it minute it stopped with "OpenCL build failed: errors in console." The error in the console was Error E013: Insufficient Private Resources.

Now, after that, every time I hit render it seems to not even try compile the kernel, it just crashes straight out with "OpenCL build failed: errors in console" but there are no errors in the console at all. I tried with --debug-cycles and --debug-all and there is nothing to indicate why the compilation of the kernel is not working.

I thought maybe openCL was caching the incomplete kernel, so I deleted the bin files stored in ~/.AMD/GLCache. After doing that, it goes back to the "Loading render kernels (this make take a few minutes)" and stopping on Error E013: Insufficient Private Resources.

My System:
Linux Debian (sid)
Intel core i7 3820 x4 @ 3.9Ghz
AMD Radeon HD 6870 1GB

EDIT
It looks like the Insufficient Private Resources has got to do with the size of the code cache on the HD6xxx series cards. See here for a similar problem in luxrender with a HD6950 card.

> In #44197#306291, @thelasthope wrote: > Again a new test with 5ad79b8 > > System: > Win 7 64bit > AMD Radeon HD 6950 2GB > > Tried to render the standard cube (F12). > The compiling of the OpenCL Kernels didn't work anymore. > > Console: > > ``` > Error E013: Insufficient Private Resources > ``` > I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB > > The latest build where the OpenCL Kernel was able to compile was f32fad9 Im getting the same issue I think. Im using the latest available code from the cycles_kernel_split branch, as at Sat May 9th. The first time I tried to render, the progress bar did the "Loading render kernels (this may take a few minutes)" and after about it minute it stopped with "OpenCL build failed: errors in console." The error in the console was `Error E013: Insufficient Private Resources`. Now, after that, every time I hit render it seems to not even try compile the kernel, it just crashes straight out with "OpenCL build failed: errors in console" but there are no errors in the console at all. I tried with `--debug-cycles` and `--debug-all` and there is nothing to indicate why the compilation of the kernel is not working. I thought maybe openCL was caching the incomplete kernel, so I deleted the bin files stored in ~/.AMD/GLCache. After doing that, it goes back to the "Loading render kernels (this make take a few minutes)" and stopping on `Error E013: Insufficient Private Resources`. My System: Linux Debian (sid) Intel core i7 3820 x4 @ 3.9Ghz AMD Radeon HD 6870 1GB **EDIT** It looks like the `Insufficient Private Resources` has got to do with the size of the code cache on the HD6xxx series cards. See [here ](http://www.luxrender.net/forum/viewtopic.php?f=16&t=11286) for a similar problem in luxrender with a HD6950 card.
Member

Added subscriber: @jesterking

Added subscriber: @jesterking
Member

I'm also getting E103:Insufficient Private Recources

System:

  • AMD A10-6800K (APU with HD 8670D)
  • R9 270x
  • Windows 7 Ultimate 64bit

I have updated to 14.12 drivers.

Cycles reports:

  • Pitcairn
  • Devastator
  • AMD A10-6800 APU with Radeon(tm) HD Graphics
  • AMD A10-6800 APU with Radeon(tm) HD Graphics + Devastator + Pitcairn

Failure to compile with 2. and 4., aborting with E103

It'd be great if the multi device 4. worked, but I'm figuring that it fails because of 2.

I'm also getting `E103:Insufficient Private Recources` System: * AMD A10-6800K (APU with HD 8670D) * R9 270x * Windows 7 Ultimate 64bit I have updated to 14.12 drivers. Cycles reports: - Pitcairn - Devastator - AMD A10-6800 APU with Radeon(tm) HD Graphics - AMD A10-6800 APU with Radeon(tm) HD Graphics + Devastator + Pitcairn Failure to compile with 2. and 4., aborting with E103 It'd be great if the multi device 4. worked, but I'm figuring that it fails because of 2.

Added subscriber: @joshr

Added subscriber: @joshr

My GPU lacks double precision compute units and that is what causes this error about insufficient resources

GPU: AMD HD 6770
Specifications: http://en.wikipedia.org/wiki/Radeon_HD_6000_Series#Chipset_table

It also reports errors which support this before "insufficient private resources"

such as:

line 57744: warning:
double-precision constant is represented as single-precision constant because double is not enabled.
*pdf = 1.0f / M_4PI_F;
               ^ 

Also note others which have reported the same issue also lack double precision support on their graphics chipsets see the above wikipedia table for details.

Solution:
The code needs to recognise a GPU's lack of double precision support and map DP constants to SP

Or if that's not possible display an error saying support for double precision is required.

**My GPU lacks double precision compute units** and that is what causes this error about insufficient resources **GPU:** AMD HD 6770 **Specifications:** http://en.wikipedia.org/wiki/Radeon_HD_6000_Series#Chipset_table It also reports errors which support this before "insufficient private resources" **such as:** ``` line 57744: warning: double-precision constant is represented as single-precision constant because double is not enabled. *pdf = 1.0f / M_4PI_F; ^ ``` Also note others which have reported the same issue also lack double precision support on their graphics chipsets see the above wikipedia table for details. **Solution:** The code needs to **recognise a GPU's lack of double precision support** and map DP constants to SP Or if that's not possible display an error saying support for double precision is required.
Author
Owner

I'm not sure why insufficient private resources is appearing, don't have any AMD hardware here, and here it all works fine on intel opencl, gtx560 and gt520m.

As for the warning -- it's really weird. The constant is explicitly float and cycles doesn't sue doubles anywhere in kernel actually.

I'm not sure why insufficient private resources is appearing, don't have any AMD hardware here, and here it all works fine on intel opencl, gtx560 and gt520m. As for the warning -- it's really weird. The constant is explicitly float and cycles doesn't sue doubles anywhere in kernel actually.

Yes all of those are able to use double precision, can anyone confirm a compiler bug on AMD hardware being unable to explicitly cast to single precision instead of double precision. That would explain this issue.

Edit:
Looks like a minor addition is needed:
http://stackoverflow.com/questions/7001424/opencl-problem-with-double-type

- ifdef cl_khr_fp64
    - pragma OPENCL EXTENSION cl_khr_fp64 : enable
- elif defined(cl_amd_fp64)
    - pragma OPENCL EXTENSION cl_amd_fp64 : enable
- else
    - error "Double precision floating point not supported by OpenCL implementation."
#endif
Yes all of those are able to use double precision, can anyone confirm a compiler bug on AMD hardware being unable to explicitly cast to single precision instead of double precision. That would explain this issue. **Edit:** Looks like a minor addition is needed: http://stackoverflow.com/questions/7001424/opencl-problem-with-double-type ``` - ifdef cl_khr_fp64 - pragma OPENCL EXTENSION cl_khr_fp64 : enable - elif defined(cl_amd_fp64) - pragma OPENCL EXTENSION cl_amd_fp64 : enable - else - error "Double precision floating point not supported by OpenCL implementation." #endif ```

An interesting post about working around this problem from:
http://devgurus.amd.com/message/1282921#1282921

Quote:
Currently OpenCL users are limited to 25% of device memory,

I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.
(there is a 512MB limit per allocation call but you can allocate as much as you like)

I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like

__kernel(global float *A, global float *B, global float *C, global float *D){}

Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory.
Here is a printout from a typical program start:

open:devices 3 gpus, 1 cpu, device(0) = Tahiti
start(cl):ndevs=3 gpus=1 time=57.136
<readback of actual allocation map>
buffer 0 start 01D1E000 to 21D1E000 size=20000000  Gap = 00000
buffer 1 start 21D1E000 to 41D1E000 size=20000000  Gap = 00000
buffer 2 start 41D1E000 to 61D1E000 size=20000000  Gap = 00000
buffer 3 start 61D1E000 to 81D1E000 size=20000000  Gap = 00000
buffer 4 start 81D1E000 to A1D1E000 size=20000000  Gap = 00000
buffer 5 start A1D1E000 to B0E1E000 size=0F100000  Gap = 00000
buffer 6 start B0E1E000 to BF21E000 size=0E400000  Gap = 00000
buffer 7 start BF21E000 to BFE1E000 size=00C00000  Gap = ----  (last address on GPU is BFFFFFFC)

The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC
type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer
Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I
usually choose hardware based on specifications.

Hope it helps.

An interesting post about working around this problem from: http://devgurus.amd.com/message/1282921#1282921 **Quote:** Currently OpenCL users are limited to 25% of device memory, I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct. (there is a 512MB limit per allocation call but you can allocate as much as you like) I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like ``` __kernel(global float *A, global float *B, global float *C, global float *D){} ``` Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory. Here is a printout from a typical program start: ``` open:devices 3 gpus, 1 cpu, device(0) = Tahiti start(cl):ndevs=3 gpus=1 time=57.136 <readback of actual allocation map> buffer 0 start 01D1E000 to 21D1E000 size=20000000 Gap = 00000 buffer 1 start 21D1E000 to 41D1E000 size=20000000 Gap = 00000 buffer 2 start 41D1E000 to 61D1E000 size=20000000 Gap = 00000 buffer 3 start 61D1E000 to 81D1E000 size=20000000 Gap = 00000 buffer 4 start 81D1E000 to A1D1E000 size=20000000 Gap = 00000 buffer 5 start A1D1E000 to B0E1E000 size=0F100000 Gap = 00000 buffer 6 start B0E1E000 to BF21E000 size=0E400000 Gap = 00000 buffer 7 start BF21E000 to BFE1E000 size=00C00000 Gap = ---- (last address on GPU is BFFFFFFC) ``` The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I usually choose hardware based on specifications. Hope it helps.

Hi testing OpenCL on CPU and GPU and get artifacts.
CPU cant render Glass shader.

Blender a49534a
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 346.47

Intel opencl_runtime_15.1_x64_5.0.0.57

mix_bmw.png

Thanks, mib

Hi testing OpenCL on CPU and GPU and get artifacts. CPU cant render Glass shader. Blender a49534a Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 346.47 Intel opencl_runtime_15.1_x64_5.0.0.57 ![mix_bmw.png](https://archive.blender.org/developer/F175480/mix_bmw.png) Thanks, mib
Author
Owner

@mib2berlin, those artifacts seems to be somewhat similar to what was having with NVidia opencl when was looking into object motion, so it could be bug outside of the kernel. Will check on that.

As for the glass shader on cpu -- don't rememebr it working here, but it is surely on the todo list to investigate. It works on nvidia tho...

@mib2berlin, those artifacts seems to be somewhat similar to what was having with NVidia opencl when was looking into object motion, so it could be bug outside of the kernel. Will check on that. As for the glass shader on cpu -- don't rememebr it working here, but it is surely on the todo list to investigate. It works on nvidia tho...

Hi, (reporting for users from Blenderartists)
DingTo asked to test some OpenCL features on http://blenderartists.org/forum/showthread.php?254521-A-good-news-for-AMD-ATI-Graphic-cards-owners&p=2871012&viewfull=1#post2871012 . Reproducible bugs (reported by more than one) are:

  • Object Motion blur freezes the whole PC even on simple scene like the default cube moving and needs a hard reset (at least 3 testers reported). Note that it seems random so you have to render 2-3 times they say to make it freeze.

Things that could be enabled by default:

  • Render Passes work (note 1 user report it is sometime slow)
  • Camera Motion Blur works perfectly
  • Hair too

Note that one of them over there has a script to test on 200 scenes of it's own and from the community. So enabled features + Hair and Camera MB seems to be rock solid.
He reports (other maybe too but it's confusing if it's same problem or object motion blur) freezes when memory usage goes above graphic card limit.

Regards

Hi, (reporting for users from Blenderartists) DingTo asked to test some OpenCL features on http://blenderartists.org/forum/showthread.php?254521-A-good-news-for-AMD-ATI-Graphic-cards-owners&p=2871012&viewfull=1#post2871012 . Reproducible bugs (reported by more than one) are: - Object Motion blur freezes the whole PC even on simple scene like the default cube moving and needs a hard reset (at least 3 testers reported). Note that it seems random so you have to render 2-3 times they say to make it freeze. Things that could be enabled by default: - Render Passes work (note 1 user report it is sometime slow) - Camera Motion Blur works perfectly - Hair too Note that one of them over there has a script to test on 200 scenes of it's own and from the community. So enabled features + Hair and Camera MB seems to be rock solid. He reports (other maybe too but it's confusing if it's same problem or object motion blur) freezes when memory usage goes above graphic card limit. Regards

Added subscriber: @omar-1

Added subscriber: @omar-1

GPU:AMD HD 5450
Error:
double-precision constant is

        represented as single-precision constant because double is not
        enabled

in many lines.
Same as @joshr

**GPU**:AMD HD 5450 **Error**: double-precision constant is ``` represented as single-precision constant because double is not enabled ``` in many lines. Same as @joshr

Added subscriber: @boxed_9k

Added subscriber: @boxed_9k

It seems to be a hardware limitation.

Info on recent (April/2015) patches from AMD, refer this wiki link, states that the supported systems are AMD's Radeon HD 7730 and above.

And they all support double precision.

It seems to be a hardware limitation. Info on recent (April/2015) patches from AMD, refer [this wiki ](http://wiki.blender.org/index.php/OpenCL) link, states that the supported systems are AMD's Radeon HD 7730 and above. And they all support double precision.

Added subscriber: @mrdotcoza

Added subscriber: @mrdotcoza

Added subscriber: @adapmal

Added subscriber: @adapmal

Ya... Please, any updates on the AMD driver crashing cycles on MAC?

I'm on a iMac Retina 5k 2014
Yosemite 10.10.5
AMD Radeon R9 M295X2

Blender Version 2.77 (2.77 2016-03-19)

Crashing log:

Device init success
Compiling OpenCL kernel ...
Build flags: 
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon R9 M295X Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon R9 M295X Compute Engine) (err:-2)
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

OpenCL kernel build output:
Error returned by cvms_element_build_from_source
OpenCL build failed: errors in console

Thanks a lot!

Ya... Please, any updates on the AMD driver crashing cycles on MAC? I'm on a iMac Retina 5k 2014 Yosemite 10.10.5 AMD Radeon R9 M295X2 Blender Version 2.77 (2.77 2016-03-19) Crashing log: ``` Device init success Compiling OpenCL kernel ... Build flags: OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517) OpenCL error (AMD Radeon R9 M295X Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon R9 M295X Compute Engine) (err:-2) OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: Error returned by cvms_element_build_from_source OpenCL kernel build output: Error returned by cvms_element_build_from_source OpenCL build failed: errors in console ``` Thanks a lot!
Author
Owner

You have to update to El Capitan at least to use OpenCL on OSX. It is actually written in our release logs: https://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.76/Cycles#OSX

You have to update to El Capitan at least to use OpenCL on OSX. It is actually written in our release logs: https://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.76/Cycles#OSX

Added subscriber: @TomG

Added subscriber: @TomG
Member

Added subscriber: @MaiLavelle

Added subscriber: @MaiLavelle

Added subscriber: @brecht

Added subscriber: @brecht

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'

Archiving old out of data task.

Archiving old out of data task.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
24 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#44197
No description provided.