Cycles OpenCL kernel-splitting work #44197

New Issue

Sergey Sharybin · 2015-03-30T22:09:23+02:00

Sergey Sharybin commented

2015-03-30 22:09:23 +02:00

This is a design task related on the Cycles kernel split patch (D1200) for the communication which are not strongly related on the actual code.

Current OpenCL state is quite limited feature-wise. There's no:

SSS
Volumes (both homogenous and heterougenous)
Motion blur
CMJ

Reports about crashes and render artifacts could happen here. Sharing benchmarks and test files which demonstrates issues is also possible here.

Please keep this thread constructive. Do not report issues with applying the patch or bulding blender. This kind of feedback is to happen via IRC.

Also do not put backtraces / logs as inlined into the comment, attach them as file instead.

All the non-constructive comments will be removed. Keep in mind it's not a forum, but a way to communicate with the patch developers.

This is a design task related on the Cycles kernel split patch ([D1200](https://archive.blender.org/developer/D1200)) for the communication which are not strongly related on the actual code. Current OpenCL state is quite limited feature-wise. There's no: - SSS - Volumes (both homogenous and heterougenous) - Motion blur - CMJ ---- Reports about crashes and render artifacts could happen here. Sharing benchmarks and test files which demonstrates issues is also possible here. Please keep this thread constructive. Do not report issues with applying the patch or bulding blender. This kind of feedback is to happen via IRC. Also do not put backtraces / logs as inlined into the comment, attach them as file instead. All the non-constructive comments will be removed. Keep in mind it's not a forum, but a way to communicate with the patch developers.

Sergey Sharybin commented

2015-03-30 22:09:24 +02:00

Changed status to: 'Open'

Sergey Sharybin self-assigned this 2015-03-30 22:09:24 +02:00

Sergey Sharybin commented

2015-03-30 22:09:24 +02:00

Added subscriber: @Sergey

George Kyriazis commented

2015-03-30 22:28:48 +02:00

Added subscriber: @GeorgeKyriazis

Wolfgang Faehnle commented

2015-03-30 22:37:27 +02:00

Added subscriber: @mib2berlin

Wolfgang Faehnle commented

2015-03-30 22:37:27 +02:00

Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.

Hash: 30c689ff7

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57

cpu_crash.blend

cpu_crash.txt

Cheers, mib

Get repeatable crash with opencl on intel with simple file. Open .blend and start render with F12, crash before last tile finish. Hash: 30c689ff7 Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 349.12 opencl-1.2-5.0.0.57 [cpu_crash.blend](https://archive.blender.org/developer/F156554/cpu_crash.blend) [cpu_crash.txt](https://archive.blender.org/developer/F156555/cpu_crash.txt) Cheers, mib

George Kyriazis commented

2015-03-30 22:50:07 +02:00

In #44197#299196, @mib2berlin wrote:
Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.

Hash: 30c689ff7

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57

cpu_crash.blend

cpu_crash.txt

Cheers, mib

Intel has problems. We don't know what the issue is, but Intel does not work on any scenes. They have some issues compiling the kernels.

> In #44197#299196, @mib2berlin wrote: > Get repeatable crash with opencl on intel with simple file. > Open .blend and start render with F12, crash before last tile finish. > > Hash: 30c689ff7 > > Opensuse 13.2/64 > Intel i5 3770K > GTX 760 4 GB > Driver 349.12 > opencl-1.2-5.0.0.57 > > > [cpu_crash.blend](https://archive.blender.org/developer/F156554/cpu_crash.blend) > > [cpu_crash.txt](https://archive.blender.org/developer/F156555/cpu_crash.txt) > > Cheers, mib Intel has problems. We don't know what the issue is, but Intel does not work on any scenes. They have some issues compiling the kernels.

mathieu menuet commented

2015-03-30 22:57:10 +02:00

Added subscriber: @bliblubli

mathieu menuet commented

2015-03-30 22:57:10 +02:00

This comment was removed by @bliblubli

*This comment was removed by @bliblubli*

Campbell Barton commented

2015-03-30 23:10:47 +02:00

Added subscriber: @ideasman42

Campbell Barton commented

2015-03-30 23:10:47 +02:00

@bliblubli, This thread is not for you to make value judgment on our work and tell us what you think our priorities should be. This task was set up so users could report how well the patch works, or where it fails.

if you think it is a bad direction or would choose some other priorities for Cycles, this is not the place to say so.

Comments like this are not appreciated and will be deleted in future.

@bliblubli, This thread is not for you to make value judgment on our work and tell us what you think our priorities should be. This task was set up so users could report how well the patch works, or where it fails. if you think it is a bad direction or would choose some other priorities for Cycles, this is not the place to say so. Comments like this are not appreciated and will be deleted in future.

marc dion commented

2015-03-31 01:26:59 +02:00

Added subscriber: @MarcClintDion

Bastien Montagne commented

2015-03-31 08:08:23 +02:00

Added subscriber: @mont29

Lenny Wang commented

2015-03-31 09:57:08 +02:00

Added subscriber: @lennyhpc

mathieu menuet commented

2015-03-31 09:59:46 +02:00

On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. (@ideasman42 sorry if you understood it so, was meant as a debate, English is not my mother tong. I didn't want to tell you what to do.)

Lenny Wang commented

2015-03-31 12:23:47 +02:00

In #44197#299276, @bliblubli wrote:
On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages.

One should avoid using CPU OpenCL with split-kernel, most of the optimizations are only effective on GPU. Additional kernels and buffers also account to extra overhead.

Using CPU as OpenCL device, we observed the original mega-kernel crashes to the desktop randomly before or after the rendering. This appears on both Intel and AMD systems using blender 2.73/2.74 on Windows. Probably should report this bug somewhere else if it matters. When the CPU OpenCL does work, the performance however is always slower than native CPU.

In multi-device mode, it's doable to use mega-kernel for CPU OpenCL and split-kernel for GPU.

> In #44197#299276, @bliblubli wrote: > On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. One should avoid using CPU OpenCL with split-kernel, most of the optimizations are only effective on GPU. Additional kernels and buffers also account to extra overhead. Using CPU as OpenCL device, we observed the original mega-kernel crashes to the desktop randomly before or after the rendering. This appears on both Intel and AMD systems using blender 2.73/2.74 on Windows. Probably should report this bug somewhere else if it matters. When the CPU OpenCL does work, the performance however is always slower than native CPU. In multi-device mode, it's doable to use mega-kernel for CPU OpenCL and split-kernel for GPU.

Sv. Lockal commented

2015-04-01 13:50:04 +02:00

Added subscriber: @Lockal

Sv. Lockal commented

2015-04-01 14:47:16 +02:00

I've tested this patch (the original one from D1200) on Ubuntu 14.10 x86-64 and NVidia GTX 690 with MikePan BMW scene.

Testing notes:

Sample count during rendering goes into negative direction
Tile preview doesn't work (viewport updates active tile only when all samples for specific tile are processed)
Big tiles doesn't work (256x256 works, 512x512 does not work, renders black image without any error message).
2x mode (NVidia GTX 690 is a dual-GPU card, somewhere between Radeon HD 7990 and Radeon HD 7970 GHz) doesn't work (black image).
Progressive refinement does not work (gives 1-sample-quality image after each sample).
Tiny tiles (8x8) make blender freeze.

Performance on NVidia (modified BMW benchmark with 256x256 tiles: do not compare absolute values with results on your system):

OpenCL (1x)	Cuda (1x)	Cuda (2x)
04:27.48	02:33.55	01:20.70

Note for testers with NVidia GPUs:

Use at least 349.12 driver, earlier versions has no OpenCL 1.2 support.
Don't even try to build with Address Sanitizer - just linking any application to libasan makes icd loader to report, that platform is not supported.

I've tested this patch (the original one from [D1200](https://archive.blender.org/developer/D1200)) on Ubuntu 14.10 x86-64 and NVidia GTX 690 with MikePan BMW scene. Testing notes: * Sample count during rendering goes into negative direction * Tile preview doesn't work (viewport updates active tile only when all samples for specific tile are processed) * Big tiles doesn't work (256x256 works, 512x512 does not work, renders black image without any error message). * 2x mode (NVidia GTX 690 is a dual-GPU card, somewhere between Radeon HD 7990 and Radeon HD 7970 GHz) doesn't work (black image). * Progressive refinement does not work (gives 1-sample-quality image after each sample). * Tiny tiles (8x8) make blender freeze. Performance on NVidia (modified BMW benchmark with 256x256 tiles: do not compare absolute values with results on your system): | OpenCL (1x) | Cuda (1x) | Cuda (2x) | ------------------ | -------------- | -------------- | 04:27.48 | 02:33.55 | 01:20.70 Note for testers with NVidia GPUs: * Use at least 349.12 driver, earlier versions has no OpenCL 1.2 support. * Don't even try to build with Address Sanitizer - just linking any application to libasan makes icd loader to report, that platform is not supported.

David Schrott commented

2015-04-04 12:20:32 +02:00

Added subscriber: @thelasthope

David Schrott commented

2015-04-04 12:20:32 +02:00

I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12

I tried to render the standard cube.

Notes:

I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
Compiling the Kernel works on my GPU (Console: "Device init sucess")
Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

I also tested the original patch form [D1200](https://archive.blender.org/developer/D1200) on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12 I tried to render the standard cube. Notes: I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering. - Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer - Compiling the Kernel works on my GPU (Console: "Device init sucess") - Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles. @GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

George Kyriazis commented

2015-04-06 17:45:19 +02:00

In #44197#300346, @thelasthope wrote:
I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12

I tried to render the standard cube.

Notes:

I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer

Compiling the Kernel works on my GPU (Console: "Device init sucess")

Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs?

We're still investigating feasibility. We can't make any commitments, though.

Our main focus is on future architectures and APIs. For example, if/when we add OpenCL 2.0 support, this won't be supported on pre-GCN, since pre-GCN HW cannot support OpenCL 2.0 features.

> In #44197#300346, @thelasthope wrote: > I also tested the original patch form [D1200](https://archive.blender.org/developer/D1200) on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12 > > I tried to render the standard cube. > > Notes: > > I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering. > > - Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer > - Compiling the Kernel works on my GPU (Console: "Device init sucess") > - Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles. > > @GeorgeKyriazis Is it possible to get Cycles working on non GCN Architecture GPUs? We're still investigating feasibility. We can't make any commitments, though. Our main focus is on future architectures and APIs. For example, if/when we add OpenCL 2.0 support, this won't be supported on pre-GCN, since pre-GCN HW cannot support OpenCL 2.0 features.

Wolfgang Faehnle commented

2015-04-16 10:47:18 +02:00

Hi OpenCL on Nvidia does not work after latest commits.

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12

Blender Hash 7d9412f

After start render it start compiling but after some kernel it goes in to endless loop.

cuda_endless_compile.txt

Cheers, mib

Hi OpenCL on Nvidia does not work after latest commits. Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 349.12 Blender Hash 7d9412f After start render it start compiling but after some kernel it goes in to endless loop. [cuda_endless_compile.txt](https://archive.blender.org/developer/F162998/cuda_endless_compile.txt) Cheers, mib

varunsundar08@gmail.com commented

2015-04-23 07:09:31 +02:00

Added subscriber: @varunsundar08

varunsundar08@gmail.com commented

2015-04-23 07:09:31 +02:00

In #44197#303154, @mib2berlin wrote:
Hi OpenCL on Nvidia does not work after latest commits.

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12

Blender Hash 7d9412f

After start render it start compiling but after some kernel it goes in to endless loop.

cuda_endless_compile.txt

Cheers, mib

Hello mib. We have set up a Linux system with Nvidia. But we are not able to re-produce the error you reported.
The configuration is as follows,
Ubuntu 14.04, 64-bit
Intel i5 4670k
GTX 780Ti
Driver 346.46

> In #44197#303154, @mib2berlin wrote: > Hi OpenCL on Nvidia does not work after latest commits. > > Opensuse 13.2/64 > Intel i5 3770K > GTX 760 4 GB > Driver 349.12 > > Blender Hash 7d9412f > > After start render it start compiling but after some kernel it goes in to endless loop. > > [cuda_endless_compile.txt](https://archive.blender.org/developer/F162998/cuda_endless_compile.txt) > > Cheers, mib Hello mib. We have set up a Linux system with Nvidia. But we are not able to re-produce the error you reported. The configuration is as follows, Ubuntu 14.04, 64-bit Intel i5 4670k GTX 780Ti Driver 346.46

Wolfgang Faehnle commented

2015-04-24 10:25:07 +02:00

Thank you for looking into.
Iirc Sergey told me to use the latest Beta driver to get OpenCL 1.2 support.
I will try to downgrade my driver to 346.46 and report here.

Cheers, mib
EDIT: Test with 346.47 and 349.16 but does not work.
Delete Intel OpenCL installation but same result.

Thank you for looking into. Iirc Sergey told me to use the latest Beta driver to get OpenCL 1.2 support. I will try to downgrade my driver to 346.46 and report here. Cheers, mib EDIT: Test with 346.47 and 349.16 but does not work. Delete Intel OpenCL installation but same result.

Thomas Berglund commented

2015-05-01 09:36:14 +02:00

Added subscriber: @ThomasBerglund

Thomas Berglund commented

2015-05-01 09:36:14 +02:00

Any chances of seeing support for the AMD FirePro D-series GPUs that are found in the Apple Mac Pro?
http://www.amd.com/en-us/solutions/professional/d-series

I have tried compiling and running the latest cycles_kernel_split branch, simply trying to render the default cube scene, but It fails to compile the OpenCL kernel.

Mac Pro (2013)
32GB ram
AMD Radeon HD FirePro D-700 6GB (2x) + Intel Xeon CPU E5-1680 v2 @ 3.00GHz

OS X 10.10.3 (build 14D136)
OpenCL 1.2 (Feb 27 2015 01:29:10)

Blender build: 2.74, hash 5ad79b8

$ blender.app/Contents/MacOS/blender --debug-cycles
found bundled python: blender-build/build_darwin/bin/blender.app/Contents/MacOS/../Resources/2.74/python
I0501 09:11:50.033089 1981256448 device_cuda.cpp:1062] CUEW initialization failed: Error opening the library
Device init succes
Device init succes
Device init succes
Compiling OpenCL kernel ...
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon HD - FirePro D700 Compute Engine) (err:-2)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

I have attached the CVMCompiler crash log:
CVMCompiler_2015-05-01-091334.crash

Anything else I can provide to help troubleshoot this? Any ideas about what is going wrong?

Any chances of seeing support for the **AMD FirePro D-series** GPUs that are found in the **Apple Mac Pro**? http://www.amd.com/en-us/solutions/professional/d-series I have tried compiling and running the latest **cycles_kernel_split branch**, simply trying to render the default cube scene, but It fails to compile the OpenCL kernel. Mac Pro (2013) 32GB ram AMD Radeon HD FirePro D-700 6GB (2x) + Intel Xeon CPU E5-1680 v2 @ 3.00GHz OS X 10.10.3 (build 14D136) OpenCL 1.2 (Feb 27 2015 01:29:10) Blender build: 2.74, hash 5ad79b8 ``` $ blender.app/Contents/MacOS/blender --debug-cycles found bundled python: blender-build/build_darwin/bin/blender.app/Contents/MacOS/../Resources/2.74/python I0501 09:11:50.033089 1981256448 device_cuda.cpp:1062] CUEW initialization failed: Error opening the library Device init succes Device init succes Device init succes Compiling OpenCL kernel ... OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517) OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon HD - FirePro D700 Compute Engine) (err:-2) OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: Error returned by cvms_element_build_from_source ``` I have attached the CVMCompiler crash log: [CVMCompiler_2015-05-01-091334.crash](https://archive.blender.org/developer/F168886/CVMCompiler_2015-05-01-091334.crash) Anything else I can provide to help troubleshoot this? Any ideas about what is going wrong?

Thomas Dinges commented

2015-05-01 09:51:14 +02:00

Added subscriber: @ThomasDinges

Thomas Dinges commented

2015-05-01 09:51:14 +02:00

Hi,
I compiled the latest cycles_kernel_split branch revision after your commits last night (Windows 7 x64, Geforce 540M) and I get an instant crash when I try to render with OpenCL, both F12 render and Viewport.

Console is showing this:

 Device init succes
 Compiling OpenCL kernel ...
 UNREACHABLE executed!

Hi, I compiled the latest cycles_kernel_split branch revision after your commits last night (Windows 7 x64, Geforce 540M) and I get an instant crash when I try to render with OpenCL, both F12 render and Viewport. Console is showing this: ``` Device init succes Compiling OpenCL kernel ... UNREACHABLE executed!

mathieu menuet commented

2015-05-01 15:55:45 +02:00

Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Hi, After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

David Schrott commented

2015-05-01 18:09:23 +02:00

Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Again a new test with 5ad79b8 System: Win 7 64bit AMD Radeon HD 6950 2GB Tried to render the standard cube (F12). The compiling of the OpenCL Kernels didn't work anymore. Console: ``` Error E013: Insufficient Private Resources ``` I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB The latest build where the OpenCL Kernel was able to compile was f32fad9

varunsundar08@gmail.com commented

2015-05-01 19:42:33 +02:00

In #44197#306232, @bliblubli wrote:
Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Hello bliblubli,
I investigated the viewport render delay . the kernel load does not take much time . the delay is actually because of the transparent shadows feature ( which is still broken on amd ) that is turned on. We will take care of it on further commits . the previous revision did not have transparent shadows feature enabled . the progress bar stays on "loading render kernels" as there is no update to the progress bar after load_kernels function ( will correct it asap) . thanks :)

> In #44197#306232, @bliblubli wrote: > Hi, > After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.) Hello bliblubli, I investigated the viewport render delay . the kernel load does not take much time . the delay is actually because of the transparent shadows feature ( which is still broken on amd ) that is turned on. We will take care of it on further commits . the previous revision did not have transparent shadows feature enabled . the progress bar stays on "loading render kernels" as there is no update to the progress bar after load_kernels function ( will correct it asap) . thanks :)

varunsundar08@gmail.com commented

2015-05-01 19:44:55 +02:00

In #44197#306291, @thelasthope wrote:
Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:
Error E013: Insufficient Private Resources
I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Hello david,
I will look into it and get back asap . thanks :)

> In #44197#306291, @thelasthope wrote: > Again a new test with 5ad79b8 > > System: > Win 7 64bit > AMD Radeon HD 6950 2GB > > Tried to render the standard cube (F12). > The compiling of the OpenCL Kernels didn't work anymore. > > Console: > > ``` > Error E013: Insufficient Private Resources > ``` > I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB > > The latest build where the OpenCL Kernel was able to compile was f32fad9 Hello david, I will look into it and get back asap . thanks :)

Sv. Lockal commented

2015-05-01 21:35:33 +02:00

I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report.

varunsundar08@gmail.com commented

2015-05-01 21:57:32 +02:00

In #44197#306340, @Lockal wrote:
I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report.

Hello lockal,
I believe the error you reported is similar to the one that Thomas dinges mentioned earlier in the day . please let us know if cycles opencl on blender master branch works for you (on nvidia). sergey mentioned that cycles opencl on master does not work with nvidia since nvidia's driver update (if I remember correctly). thanks .

> In #44197#306340, @Lockal wrote: > I've checked 5ad79b8 again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08 report. Hello lockal, I believe the error you reported is similar to the one that Thomas dinges mentioned earlier in the day . please let us know if cycles opencl on blender master branch works for you (on nvidia). sergey mentioned that cycles opencl on master does not work with nvidia since nvidia's driver update (if I remember correctly). thanks .

Lapineige commented

2015-05-02 11:21:24 +02:00

Added subscriber: @Lapineige

Ashley Sommer commented

2015-05-09 07:32:16 +02:00

Added subscriber: @flubba86

Ashley Sommer commented

2015-05-09 07:32:16 +02:00

In #44197#306291, @thelasthope wrote:
Again a new test with 5ad79b8

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:
Error E013: Insufficient Private Resources
I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was f32fad9

Im getting the same issue I think.
Im using the latest available code from the cycles_kernel_split branch, as at Sat May 9th.
The first time I tried to render, the progress bar did the "Loading render kernels (this may take a few minutes)" and after about it minute it stopped with "OpenCL build failed: errors in console." The error in the console was Error E013: Insufficient Private Resources.

Now, after that, every time I hit render it seems to not even try compile the kernel, it just crashes straight out with "OpenCL build failed: errors in console" but there are no errors in the console at all. I tried with --debug-cycles and --debug-all and there is nothing to indicate why the compilation of the kernel is not working.

I thought maybe openCL was caching the incomplete kernel, so I deleted the bin files stored in ~/.AMD/GLCache. After doing that, it goes back to the "Loading render kernels (this make take a few minutes)" and stopping on Error E013: Insufficient Private Resources.

My System:
Linux Debian (sid)
Intel core i7 3820 x4 @ 3.9Ghz
AMD Radeon HD 6870 1GB

EDIT
It looks like the Insufficient Private Resources has got to do with the size of the code cache on the HD6xxx series cards. See here for a similar problem in luxrender with a HD6950 card.

> In #44197#306291, @thelasthope wrote: > Again a new test with 5ad79b8 > > System: > Win 7 64bit > AMD Radeon HD 6950 2GB > > Tried to render the standard cube (F12). > The compiling of the OpenCL Kernels didn't work anymore. > > Console: > > ``` > Error E013: Insufficient Private Resources > ``` > I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB > > The latest build where the OpenCL Kernel was able to compile was f32fad9 Im getting the same issue I think. Im using the latest available code from the cycles_kernel_split branch, as at Sat May 9th. The first time I tried to render, the progress bar did the "Loading render kernels (this may take a few minutes)" and after about it minute it stopped with "OpenCL build failed: errors in console." The error in the console was `Error E013: Insufficient Private Resources`. Now, after that, every time I hit render it seems to not even try compile the kernel, it just crashes straight out with "OpenCL build failed: errors in console" but there are no errors in the console at all. I tried with `--debug-cycles` and `--debug-all` and there is nothing to indicate why the compilation of the kernel is not working. I thought maybe openCL was caching the incomplete kernel, so I deleted the bin files stored in ~/.AMD/GLCache. After doing that, it goes back to the "Loading render kernels (this make take a few minutes)" and stopping on `Error E013: Insufficient Private Resources`. My System: Linux Debian (sid) Intel core i7 3820 x4 @ 3.9Ghz AMD Radeon HD 6870 1GB **EDIT** It looks like the `Insufficient Private Resources` has got to do with the size of the code cache on the HD6xxx series cards. See [here ](http://www.luxrender.net/forum/viewtopic.php?f=16&t=11286) for a similar problem in luxrender with a HD6950 card.

Nathan Letwory commented

2015-05-12 12:10:42 +02:00

Added subscriber: @jesterking

Nathan Letwory commented

2015-05-12 12:10:43 +02:00

I'm also getting E103:Insufficient Private Recources

System:

AMD A10-6800K (APU with HD 8670D)
R9 270x
Windows 7 Ultimate 64bit

I have updated to 14.12 drivers.

Cycles reports:

Pitcairn
Devastator
AMD A10-6800 APU with Radeon(tm) HD Graphics
AMD A10-6800 APU with Radeon(tm) HD Graphics + Devastator + Pitcairn

Failure to compile with 2. and 4., aborting with E103

It'd be great if the multi device 4. worked, but I'm figuring that it fails because of 2.

I'm also getting `E103:Insufficient Private Recources` System: * AMD A10-6800K (APU with HD 8670D) * R9 270x * Windows 7 Ultimate 64bit I have updated to 14.12 drivers. Cycles reports: - Pitcairn - Devastator - AMD A10-6800 APU with Radeon(tm) HD Graphics - AMD A10-6800 APU with Radeon(tm) HD Graphics + Devastator + Pitcairn Failure to compile with 2. and 4., aborting with E103 It'd be great if the multi device 4. worked, but I'm figuring that it fails because of 2.

josh commented

2015-05-12 14:27:20 +02:00

Added subscriber: @joshr

josh commented

2015-05-12 14:27:20 +02:00

My GPU lacks double precision compute units and that is what causes this error about insufficient resources

GPU: AMD HD 6770
Specifications: http://en.wikipedia.org/wiki/Radeon_HD_6000_Series#Chipset_table

It also reports errors which support this before "insufficient private resources"

such as:

line 57744: warning:
double-precision constant is represented as single-precision constant because double is not enabled.
*pdf = 1.0f / M_4PI_F;
               ^

Also note others which have reported the same issue also lack double precision support on their graphics chipsets see the above wikipedia table for details.

Solution:
The code needs to recognise a GPU's lack of double precision support and map DP constants to SP

Or if that's not possible display an error saying support for double precision is required.

**My GPU lacks double precision compute units** and that is what causes this error about insufficient resources **GPU:** AMD HD 6770 **Specifications:** http://en.wikipedia.org/wiki/Radeon_HD_6000_Series#Chipset_table It also reports errors which support this before "insufficient private resources" **such as:** ``` line 57744: warning: double-precision constant is represented as single-precision constant because double is not enabled. *pdf = 1.0f / M_4PI_F; ^ ``` Also note others which have reported the same issue also lack double precision support on their graphics chipsets see the above wikipedia table for details. **Solution:** The code needs to **recognise a GPU's lack of double precision support** and map DP constants to SP Or if that's not possible display an error saying support for double precision is required.

Sergey Sharybin commented

2015-05-12 14:36:15 +02:00

I'm not sure why insufficient private resources is appearing, don't have any AMD hardware here, and here it all works fine on intel opencl, gtx560 and gt520m.

As for the warning -- it's really weird. The constant is explicitly float and cycles doesn't sue doubles anywhere in kernel actually.

I'm not sure why insufficient private resources is appearing, don't have any AMD hardware here, and here it all works fine on intel opencl, gtx560 and gt520m. As for the warning -- it's really weird. The constant is explicitly float and cycles doesn't sue doubles anywhere in kernel actually.

josh commented

2015-05-12 14:39:12 +02:00

Yes all of those are able to use double precision, can anyone confirm a compiler bug on AMD hardware being unable to explicitly cast to single precision instead of double precision. That would explain this issue.

Edit:
Looks like a minor addition is needed:
http://stackoverflow.com/questions/7001424/opencl-problem-with-double-type

- ifdef cl_khr_fp64
    - pragma OPENCL EXTENSION cl_khr_fp64 : enable
- elif defined(cl_amd_fp64)
    - pragma OPENCL EXTENSION cl_amd_fp64 : enable
- else
    - error "Double precision floating point not supported by OpenCL implementation."
#endif

Yes all of those are able to use double precision, can anyone confirm a compiler bug on AMD hardware being unable to explicitly cast to single precision instead of double precision. That would explain this issue. **Edit:** Looks like a minor addition is needed: http://stackoverflow.com/questions/7001424/opencl-problem-with-double-type ``` - ifdef cl_khr_fp64 - pragma OPENCL EXTENSION cl_khr_fp64 : enable - elif defined(cl_amd_fp64) - pragma OPENCL EXTENSION cl_amd_fp64 : enable - else - error "Double precision floating point not supported by OpenCL implementation." #endif ```

josh commented

2015-05-12 16:21:50 +02:00

An interesting post about working around this problem from:
http://devgurus.amd.com/message/1282921#1282921

Quote:
Currently OpenCL users are limited to 25% of device memory,

I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.
(there is a 512MB limit per allocation call but you can allocate as much as you like)

I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like

__kernel(global float *A, global float *B, global float *C, global float *D){}

Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory.
Here is a printout from a typical program start:

open:devices 3 gpus, 1 cpu, device(0) = Tahiti
start(cl):ndevs=3 gpus=1 time=57.136
<readback of actual allocation map>
buffer 0 start 01D1E000 to 21D1E000 size=20000000  Gap = 00000
buffer 1 start 21D1E000 to 41D1E000 size=20000000  Gap = 00000
buffer 2 start 41D1E000 to 61D1E000 size=20000000  Gap = 00000
buffer 3 start 61D1E000 to 81D1E000 size=20000000  Gap = 00000
buffer 4 start 81D1E000 to A1D1E000 size=20000000  Gap = 00000
buffer 5 start A1D1E000 to B0E1E000 size=0F100000  Gap = 00000
buffer 6 start B0E1E000 to BF21E000 size=0E400000  Gap = 00000
buffer 7 start BF21E000 to BFE1E000 size=00C00000  Gap = ----  (last address on GPU is BFFFFFFC)

The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC
type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer
Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I
usually choose hardware based on specifications.

Hope it helps.

An interesting post about working around this problem from: http://devgurus.amd.com/message/1282921#1282921 **Quote:** Currently OpenCL users are limited to 25% of device memory, I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct. (there is a 512MB limit per allocation call but you can allocate as much as you like) I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like ``` __kernel(global float *A, global float *B, global float *C, global float *D){} ``` Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory. Here is a printout from a typical program start: ``` open:devices 3 gpus, 1 cpu, device(0) = Tahiti start(cl):ndevs=3 gpus=1 time=57.136 <readback of actual allocation map> buffer 0 start 01D1E000 to 21D1E000 size=20000000 Gap = 00000 buffer 1 start 21D1E000 to 41D1E000 size=20000000 Gap = 00000 buffer 2 start 41D1E000 to 61D1E000 size=20000000 Gap = 00000 buffer 3 start 61D1E000 to 81D1E000 size=20000000 Gap = 00000 buffer 4 start 81D1E000 to A1D1E000 size=20000000 Gap = 00000 buffer 5 start A1D1E000 to B0E1E000 size=0F100000 Gap = 00000 buffer 6 start B0E1E000 to BF21E000 size=0E400000 Gap = 00000 buffer 7 start BF21E000 to BFE1E000 size=00C00000 Gap = ---- (last address on GPU is BFFFFFFC) ``` The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I usually choose hardware based on specifications. Hope it helps.

Wolfgang Faehnle commented

2015-05-16 14:49:12 +02:00

Hi testing OpenCL on CPU and GPU and get artifacts.
CPU cant render Glass shader.

Blender a49534a
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 346.47

Intel opencl_runtime_15.1_x64_5.0.0.57

Thanks, mib

Hi testing OpenCL on CPU and GPU and get artifacts. CPU cant render Glass shader. Blender a49534a Opensuse 13.2/64 Intel i5 3770K GTX 760 4 GB Driver 346.47 Intel opencl_runtime_15.1_x64_5.0.0.57 ![mix_bmw.png](https://archive.blender.org/developer/F175480/mix_bmw.png) Thanks, mib

Sergey Sharybin commented

2015-05-17 15:53:55 +02:00

@mib2berlin, those artifacts seems to be somewhat similar to what was having with NVidia opencl when was looking into object motion, so it could be bug outside of the kernel. Will check on that.

As for the glass shader on cpu -- don't rememebr it working here, but it is surely on the todo list to investigate. It works on nvidia tho...

@mib2berlin, those artifacts seems to be somewhat similar to what was having with NVidia opencl when was looking into object motion, so it could be bug outside of the kernel. Will check on that. As for the glass shader on cpu -- don't rememebr it working here, but it is surely on the todo list to investigate. It works on nvidia tho...

mathieu menuet commented

2015-05-20 07:34:56 +02:00

Hi, (reporting for users from Blenderartists)
DingTo asked to test some OpenCL features on http://blenderartists.org/forum/showthread.php?254521-A-good-news-for-AMD-ATI-Graphic-cards-owners&p=2871012&viewfull=1#post2871012 . Reproducible bugs (reported by more than one) are:

Object Motion blur freezes the whole PC even on simple scene like the default cube moving and needs a hard reset (at least 3 testers reported). Note that it seems random so you have to render 2-3 times they say to make it freeze.

Things that could be enabled by default:

Render Passes work (note 1 user report it is sometime slow)
Camera Motion Blur works perfectly
Hair too

Note that one of them over there has a script to test on 200 scenes of it's own and from the community. So enabled features + Hair and Camera MB seems to be rock solid.
He reports (other maybe too but it's confusing if it's same problem or object motion blur) freezes when memory usage goes above graphic card limit.

Regards

Hi, (reporting for users from Blenderartists) DingTo asked to test some OpenCL features on http://blenderartists.org/forum/showthread.php?254521-A-good-news-for-AMD-ATI-Graphic-cards-owners&p=2871012&viewfull=1#post2871012 . Reproducible bugs (reported by more than one) are: - Object Motion blur freezes the whole PC even on simple scene like the default cube moving and needs a hard reset (at least 3 testers reported). Note that it seems random so you have to render 2-3 times they say to make it freeze. Things that could be enabled by default: - Render Passes work (note 1 user report it is sometime slow) - Camera Motion Blur works perfectly - Hair too Note that one of them over there has a script to test on 200 scenes of it's own and from the community. So enabled features + Hair and Camera MB seems to be rock solid. He reports (other maybe too but it's confusing if it's same problem or object motion blur) freezes when memory usage goes above graphic card limit. Regards

test commented

2015-07-01 22:29:05 +02:00

Added subscriber: @omar-1

test commented

2015-07-01 22:29:06 +02:00

GPU:AMD HD 5450
Error:
double-precision constant is

        represented as single-precision constant because double is not
        enabled

in many lines.
Same as @joshr

**GPU**:AMD HD 5450 **Error**: double-precision constant is ``` represented as single-precision constant because double is not enabled ``` in many lines. Same as @joshr

boxd commented

2015-08-09 19:49:56 +02:00

Added subscriber: @boxed_9k

boxd commented

2015-08-09 19:49:56 +02:00

It seems to be a hardware limitation.

Info on recent (April/2015) patches from AMD, refer this wiki link, states that the supported systems are AMD's Radeon HD 7730 and above.

And they all support double precision.

It seems to be a hardware limitation. Info on recent (April/2015) patches from AMD, refer [this wiki ](http://wiki.blender.org/index.php/OpenCL) link, states that the supported systems are AMD's Radeon HD 7730 and above. And they all support double precision.

Nick Hill commented

2016-02-05 12:58:37 +01:00

Added subscriber: @mrdotcoza

Luiz Paulo Iazzetti Santos commented

2016-04-28 17:13:58 +02:00

Added subscriber: @adapmal

Luiz Paulo Iazzetti Santos commented

2016-04-28 17:13:58 +02:00

Ya... Please, any updates on the AMD driver crashing cycles on MAC?

I'm on a iMac Retina 5k 2014
Yosemite 10.10.5
AMD Radeon R9 M295X2

Blender Version 2.77 (2.77 2016-03-19)

Crashing log:

Device init success
Compiling OpenCL kernel ...
Build flags: 
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon R9 M295X Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon R9 M295X Compute Engine) (err:-2)
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

OpenCL kernel build output:
Error returned by cvms_element_build_from_source
OpenCL build failed: errors in console

Thanks a lot!

Ya... Please, any updates on the AMD driver crashing cycles on MAC? I'm on a iMac Retina 5k 2014 Yosemite 10.10.5 AMD Radeon R9 M295X2 Blender Version 2.77 (2.77 2016-03-19) Crashing log: ``` Device init success Compiling OpenCL kernel ... Build flags: OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517) OpenCL error (AMD Radeon R9 M295X Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon R9 M295X Compute Engine) (err:-2) OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: Error returned by cvms_element_build_from_source OpenCL kernel build output: Error returned by cvms_element_build_from_source OpenCL build failed: errors in console ``` Thanks a lot!

Sergey Sharybin commented

2016-04-28 17:47:31 +02:00

You have to update to El Capitan at least to use OpenCL on OSX. It is actually written in our release logs: https://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.76/Cycles#OSX