Page MenuHome

Cycles OpenCL kernel-splitting work
Closed, ArchivedPublic

Description

This is a design task related on the Cycles kernel split patch (D1200) for the communication which are not strongly related on the actual code.

Current OpenCL state is quite limited feature-wise. There's no:

  • SSS
  • Volumes (both homogenous and heterougenous)
  • Motion blur
  • CMJ

Reports about crashes and render artifacts could happen here. Sharing benchmarks and test files which demonstrates issues is also possible here.

Please keep this thread constructive. Do not report issues with applying the patch or bulding blender. This kind of feedback is to happen via IRC.

Also do not put backtraces / logs as inlined into the comment, attach them as file instead.

All the non-constructive comments will be removed. Keep in mind it's not a forum, but a way to communicate with the patch developers.

Details

Type
Design

Event Timeline

Sergey Sharybin (sergey) set Type to Design.
Sergey Sharybin (sergey) updated the task description. (Show Details)
Sergey Sharybin (sergey) created this task.
Sergey Sharybin (sergey) claimed this task.
Sergey Sharybin (sergey) raised the priority of this task from to Needs Triage by Developer.

Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.

Hash: rB30c689ff7f10

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57

Cheers, mib

Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.
Hash: rB30c689ff7f10
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57



Cheers, mib

Intel has problems. We don't know what the issue is, but Intel does not work on any scenes. They have some issues compiling the kernels.

@mathieu menuet (bliblubli), This thread is not for you to make value judgment on our work and tell us what you think our priorities should be. This task was set up so users could report how well the patch works, or where it fails.

if you think it is a bad direction or would choose some other priorities for Cycles, this is not the place to say so.

Comments like this are not appreciated and will be deleted in future.

On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages. (@Campbell Barton (campbellbarton) sorry if you understood it so, was meant as a debate, English is not my mother tong. I didn't want to tell you what to do.)

On WIndows 7, the kernel compiles with AMD drivers for OpenCL CPU on Intel processors (without Intel GPU drivers). Blender doesn't crash but rendering takes ages.

One should avoid using CPU OpenCL with split-kernel, most of the optimizations are only effective on GPU. Additional kernels and buffers also account to extra overhead.

Using CPU as OpenCL device, we observed the original mega-kernel crashes to the desktop randomly before or after the rendering. This appears on both Intel and AMD systems using blender 2.73/2.74 on Windows. Probably should report this bug somewhere else if it matters. When the CPU OpenCL does work, the performance however is always slower than native CPU.

In multi-device mode, it's doable to use mega-kernel for CPU OpenCL and split-kernel for GPU.

I've tested this patch (the original one from D1200) on Ubuntu 14.10 x86-64 and NVidia GTX 690 with MikePan BMW scene.

Testing notes:

  • Sample count during rendering goes into negative direction
  • Tile preview doesn't work (viewport updates active tile only when all samples for specific tile are processed)
  • Big tiles doesn't work (256x256 works, 512x512 does not work, renders black image without any error message).
  • 2x mode (NVidia GTX 690 is a dual-GPU card, somewhere between Radeon HD 7990 and Radeon HD 7970 GHz) doesn't work (black image).
  • Progressive refinement does not work (gives 1-sample-quality image after each sample).
  • Tiny tiles (8x8) make blender freeze.

Performance on NVidia (modified BMW benchmark with 256x256 tiles: do not compare absolute values with results on your system):

OpenCL (1x)Cuda (1x)Cuda (2x)
04:27.4802:33.5501:20.70

Note for testers with NVidia GPUs:

  • Use at least 349.12 driver, earlier versions has no OpenCL 1.2 support.
  • Don't even try to build with Address Sanitizer - just linking any application to libasan makes icd loader to report, that platform is not supported.

I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12

I tried to render the standard cube.

Notes:

I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

  • Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
  • Compiling the Kernel works on my GPU (Console: "Device init sucess")
  • Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@George Kyriazis (kyriazis) Is it possible to get Cycles working on non GCN Architecture GPUs?

I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12
I tried to render the standard cube.
Notes:
I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

  • Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
  • Compiling the Kernel works on my GPU (Console: "Device init sucess")
  • Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@George Kyriazis (kyriazis) Is it possible to get Cycles working on non GCN Architecture GPUs?

We're still investigating feasibility. We can't make any commitments, though.

Our main focus is on future architectures and APIs. For example, if/when we add OpenCL 2.0 support, this won't be supported on pre-GCN, since pre-GCN HW cannot support OpenCL 2.0 features.

Hi OpenCL on Nvidia does not work after latest commits.

Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12

Blender Hash rB7d9412f0d4a8

After start render it start compiling but after some kernel it goes in to endless loop.

Cheers, mib

Hi OpenCL on Nvidia does not work after latest commits.
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
Blender Hash rB7d9412f0d4a8
After start render it start compiling but after some kernel it goes in to endless loop.


Cheers, mib

Hello mib. We have set up a Linux system with Nvidia. But we are not able to re-produce the error you reported.
The configuration is as follows,
Ubuntu 14.04, 64-bit
Intel i5 4670k
GTX 780Ti
Driver 346.46

Thank you for looking into.
Iirc Sergey told me to use the latest Beta driver to get OpenCL 1.2 support.
I will try to downgrade my driver to 346.46 and report here.

Cheers, mib
EDIT: Test with 346.47 and 349.16 but does not work.
Delete Intel OpenCL installation but same result.

Any chances of seeing support for the AMD FirePro D-series GPUs that are found in the Apple Mac Pro?
http://www.amd.com/en-us/solutions/professional/d-series

I have tried compiling and running the latest cycles_kernel_split branch, simply trying to render the default cube scene, but It fails to compile the OpenCL kernel.

Mac Pro (2013)
32GB ram
AMD Radeon HD FirePro D-700 6GB (2x) + Intel Xeon CPU E5-1680 v2 @ 3.00GHz

OS X 10.10.3 (build 14D136)
OpenCL 1.2 (Feb 27 2015 01:29:10)

Blender build: 2.74, hash rB5ad79b83afae

$ blender.app/Contents/MacOS/blender --debug-cycles
found bundled python: blender-build/build_darwin/bin/blender.app/Contents/MacOS/../Resources/2.74/python
I0501 09:11:50.033089 1981256448 device_cuda.cpp:1062] CUEW initialization failed: Error opening the library
Device init succes
Device init succes
Device init succes
Compiling OpenCL kernel ...
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon HD - FirePro D700 Compute Engine) (err:-2)
OpenCL error (AMD Radeon HD - FirePro D700 Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

I have attached the CVMCompiler crash log:

Anything else I can provide to help troubleshoot this? Any ideas about what is going wrong?

Hi,
I compiled the latest cycles_kernel_split branch revision after your commits last night (Windows 7 x64, Geforce 540M) and I get an instant crash when I try to render with OpenCL, both F12 render and Viewport.

Console is showing this:

Device init succes
Compiling OpenCL kernel ...
UNREACHABLE executed!

Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Again a new test with rB5ad79b83afae

System:
Win 7 64bit
AMD Radeon HD 6950 2GB

Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.

Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB

The latest build where the OpenCL Kernel was able to compile was rBf32fad99c4e4

Hi,
After a fresh build 2 hours ago and a merge with lateste master, all works fine on AMD Cards, performance are the same compared with a week ago. Viewport render now works, but it is really slow. Slower than CPU, certainly because it loads the megakernel for every sample (it writes "loading kernel..." for a bit more than 1sec every time a sample is done, for 32 samples, it's more than 30sec loosed only to load the kernel.)

Hello bliblubli,
I investigated the viewport render delay . the kernel load does not take much time . the delay is actually because of the transparent shadows feature ( which is still broken on amd ) that is turned on. We will take care of it on further commits . the previous revision did not have transparent shadows feature enabled . the progress bar stays on "loading render kernels" as there is no update to the progress bar after load_kernels function ( will correct it asap) . thanks :)

Again a new test with rB5ad79b83afae
System:
Win 7 64bit
AMD Radeon HD 6950 2GB
Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.
Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB
The latest build where the OpenCL Kernel was able to compile was rBf32fad99c4e4

Hello david,
I will look into it and get back asap . thanks :)

I've checked rB5ad79b83afae again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08@gmail.com (varunsundar08) report.

I've checked rB5ad79b83afae again with NVidia and it doesn't compile on my computer anymore even with ADV shading disabled. The compile process looks similar to nvcc, clBuildProgram works for 1 minute or so and then eats up all available RAM and crashes. I guess this is related to @varunsundar08@gmail.com (varunsundar08) report.

Hello lockal,
I believe the error you reported is similar to the one that Thomas dinges mentioned earlier in the day . please let us know if cycles opencl on blender master branch works for you (on nvidia). sergey mentioned that cycles opencl on master does not work with nvidia since nvidia's driver update (if I remember correctly). thanks .

Again a new test with rB5ad79b83afae
System:
Win 7 64bit
AMD Radeon HD 6950 2GB
Tried to render the standard cube (F12).
The compiling of the OpenCL Kernels didn't work anymore.
Console:

Error E013: Insufficient Private Resources

I monitored the VRAM usage of the GPU and the VRAM Usage was all the time while rendering around 169MB
The latest build where the OpenCL Kernel was able to compile was rBf32fad99c4e4

Im getting the same issue I think.
Im using the latest available code from the cycles_kernel_split branch, as at Sat May 9th.
The first time I tried to render, the progress bar did the "Loading render kernels (this may take a few minutes)" and after about it minute it stopped with "OpenCL build failed: errors in console." The error in the console was Error E013: Insufficient Private Resources.

Now, after that, every time I hit render it seems to not even try compile the kernel, it just crashes straight out with "OpenCL build failed: errors in console" but there are no errors in the console at all. I tried with --debug-cycles and --debug-all and there is nothing to indicate why the compilation of the kernel is not working.

I thought maybe openCL was caching the incomplete kernel, so I deleted the bin files stored in ~/.AMD/GLCache. After doing that, it goes back to the "Loading render kernels (this make take a few minutes)" and stopping on Error E013: Insufficient Private Resources.

My System:
Linux Debian (sid)
Intel core i7 3820 x4 @ 3.9Ghz
AMD Radeon HD 6870 1GB

EDIT
It looks like the Insufficient Private Resources has got to do with the size of the code cache on the HD6xxx series cards. See here for a similar problem in luxrender with a HD6950 card.

I'm also getting E103:Insufficient Private Recources

System:

  • AMD A10-6800K (APU with HD 8670D)
  • R9 270x
  • Windows 7 Ultimate 64bit

I have updated to 14.12 drivers.

Cycles reports:

  1. Pitcairn
  2. Devastator
  3. AMD A10-6800 APU with Radeon(tm) HD Graphics
  4. AMD A10-6800 APU with Radeon(tm) HD Graphics + Devastator + Pitcairn

Failure to compile with 2. and 4., aborting with E103

It'd be great if the multi device 4. worked, but I'm figuring that it fails because of 2.

josh (joshr) added a subscriber: josh (joshr).EditedMay 12 2015, 2:27 PM

My GPU lacks double precision compute units and that is what causes this error about insufficient resources

GPU: AMD HD 6770
Specifications: http://en.wikipedia.org/wiki/Radeon_HD_6000_Series#Chipset_table

It also reports errors which support this before "insufficient private resources"

such as:

line 57744: warning:
double-precision constant is represented as single-precision constant because double is not enabled.
*pdf = 1.0f / M_4PI_F;
               ^

Also note others which have reported the same issue also lack double precision support on their graphics chipsets see the above wikipedia table for details.

Solution:
The code needs to recognise a GPU's lack of double precision support and map DP constants to SP

Or if that's not possible display an error saying support for double precision is required.

I'm not sure why insufficient private resources is appearing, don't have any AMD hardware here, and here it all works fine on intel opencl, gtx560 and gt520m.

As for the warning -- it's really weird. The constant is explicitly float and cycles doesn't sue doubles anywhere in kernel actually.

josh (joshr) added a comment.EditedMay 12 2015, 2:39 PM

Yes all of those are able to use double precision, can anyone confirm a compiler bug on AMD hardware being unable to explicitly cast to single precision instead of double precision. That would explain this issue.

Edit:
Looks like a minor addition is needed:
http://stackoverflow.com/questions/7001424/opencl-problem-with-double-type

#ifdef cl_khr_fp64
    #pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)
    #pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
    #error "Double precision floating point not supported by OpenCL implementation."
#endif
josh (joshr) added a comment.EditedMay 12 2015, 4:21 PM

An interesting post about working around this problem from:
http://devgurus.amd.com/message/1282921#1282921

Quote:
Currently OpenCL users are limited to 25% of device memory,

I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.
(there is a 512MB limit per allocation call but you can allocate as much as you like)

I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously. Example, allocating 2GB you might have kernel buffers like

__kernel(global float *A, global float *B, global float *C, global float *D){}

Since this is C language and A,B,C,D are memory pointers, you can use A to reference all of memory.
Here is a printout from a typical program start:

open:devices 3 gpus, 1 cpu, device(0) = Tahiti
start(cl):ndevs=3 gpus=1 time=57.136
<readback of actual allocation map>
buffer 0 start 01D1E000 to 21D1E000 size=20000000  Gap = 00000
buffer 1 start 21D1E000 to 41D1E000 size=20000000  Gap = 00000
buffer 2 start 41D1E000 to 61D1E000 size=20000000  Gap = 00000
buffer 3 start 61D1E000 to 81D1E000 size=20000000  Gap = 00000
buffer 4 start 81D1E000 to A1D1E000 size=20000000  Gap = 00000
buffer 5 start A1D1E000 to B0E1E000 size=0F100000  Gap = 00000
buffer 6 start B0E1E000 to BF21E000 size=0E400000  Gap = 00000
buffer 7 start BF21E000 to BFE1E000 size=00C00000  Gap = ----  (last address on GPU is BFFFFFFC)

The last couple of buffers are different size for an unrelated reason. Note, I have not used GPU_MAX_ALLOC
type parameters and have never seen a need to. This also works on Cayman, and Barts devices but I prefer
Tahiti because the memory is so large and fast. Sorry, I don't know much about Nvida devices because I
usually choose hardware based on specifications.

Hope it helps.

Hi testing OpenCL on CPU and GPU and get artifacts.
CPU cant render Glass shader.

Blender a49534a
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 346.47

Intel opencl_runtime_15.1_x64_5.0.0.57

Thanks, mib

@Wolfgang Faehnle (mib2berlin), those artifacts seems to be somewhat similar to what was having with NVidia opencl when was looking into object motion, so it could be bug outside of the kernel. Will check on that.

As for the glass shader on cpu -- don't rememebr it working here, but it is surely on the todo list to investigate. It works on nvidia tho...

Hi, (reporting for users from Blenderartists)
DingTo asked to test some OpenCL features on http://blenderartists.org/forum/showthread.php?254521-A-good-news-for-AMD-ATI-Graphic-cards-owners&p=2871012&viewfull=1#post2871012 . Reproducible bugs (reported by more than one) are:

  • Object Motion blur freezes the whole PC even on simple scene like the default cube moving and needs a hard reset (at least 3 testers reported). Note that it seems random so you have to render 2-3 times they say to make it freeze.

Things that could be enabled by default:

  • Render Passes work (note 1 user report it is sometime slow)
  • Camera Motion Blur works perfectly
  • Hair too

Note that one of them over there has a script to test on 200 scenes of it's own and from the community. So enabled features + Hair and Camera MB seems to be rock solid.
He reports (other maybe too but it's confusing if it's same problem or object motion blur) freezes when memory usage goes above graphic card limit.

Regards

GPU:AMD HD 5450
Error:
double-precision constant is

represented as single-precision constant because double is not
enabled

in many lines.
Same as @josh (joshr)

It seems to be a hardware limitation.

Info on recent (April/2015) patches from AMD, refer this wiki link, states that the supported systems are AMD's Radeon HD 7730 and above.

And they all support double precision.

Ya... Please, any updates on the AMD driver crashing cycles on MAC?

I'm on a iMac Retina 5k 2014
Yosemite 10.10.5
AMD Radeon R9 M295X2

Blender Version 2.77 (2.77 2016-03-19)

Crashing log:

Device init success
Compiling OpenCL kernel ...
Build flags: 
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (517)
OpenCL error (AMD Radeon R9 M295X Compute Engine): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021c00 (AMD Radeon R9 M295X Compute Engine) (err:-2)
OpenCL error (AMD Radeon R9 M295X Compute Engine): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error returned by cvms_element_build_from_source

OpenCL kernel build output:
Error returned by cvms_element_build_from_source
OpenCL build failed: errors in console

Thanks a lot!

You have to update to El Capitan at least to use OpenCL on OSX. It is actually written in our release logs: https://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.76/Cycles#OSX

Brecht Van Lommel (brecht) closed this task as Archived.

Archiving old out of data task.