Page MenuHome

Need more info when failed to compile OpenCL code
Closed, InvalidPublic

Description

When I getting message like this (blender --verbose 99 --debug-all)

ED_screen_refresh: set screen
bpy.ops.wm.open_mainfile(filepath="/media/Compressed/data/scenes/classroom/untitled.blend", load_ui=True, use_scripts=False)
WM_operator_last_properties_store: storing properties for 'WM_OT_open_mainfile'
Thread 0: update object OBCamera
recalcob Camera
Thread 0: update object OBLamp
recalcob Lamp
Thread 3: update object OBSuzanne
recalcob Suzanne
recalcdata Suzanne
Thread 0: total 2 objects in 0.000032 sec.
  Camera in 0.000009 sec
  Lamp in 0.000023 sec
Thread 1: total 0 objects in 0.000000 sec.
Thread 2: total 0 objects in 0.000000 sec.
Thread 3: total 1 objects in 0.000115 sec.
  Suzanne in 0.000115 sec
Scene update in 0.001901 sec
wm_window_make_drawable: set drawable 1
ui_item_menutype_func: opening menu "INFO_MT_render"
Evaluate all animation - 1.000000
	No Actions, so no animation needs to be evaluated...
Device init success
Compiling OpenCL kernel ...
Build flags: -D__NODES_MAX_GROUP__=3 -D__NODES_FEATURES__=0 -D__MAX_CLOSURE__=1 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__
OpenCL build failed: errors in console
bpy.ops.info.reports_display_update()
WM_operator_last_properties_store: storing properties for 'INFO_OT_reports_display_update'

I still have two unanswered questions:
Compiling which OpenCL kernel?
Which errors, in which console?
May be knowing those answers someone (even may be I) may try to fix this kernel and even possible it's template and send some patches. Also very needed at least one thing: save unsuccessful kernel sources (one source if no includes were used) to temporary directory and print it's path.

Event Timeline

Try --debug-cycles

@Sergey Sharybin (sergey): --debug-all does not include Cycles, Libmw and Memory. Intentional?

@Thomas Dinges (dingto), i've run into --debug-all not including Libmv another day and was surprised actually. I don't remember any discussion about this and personally i would just enable Cycles and Libmv with --debug-all.

Also agree we should add note which exact kernel is being compiled in. Should be a simple thing to replace "Compiling OpenCL kernel ..." with "Compiling {Bake,Init,SceneIntersect...} OpenCL kernel ..."

However, it is important to note, that in this particular situation we can only add info about which kernel failed to compile. Actual compilation error is reported by OpenCL and always printed to stderr, regardless of debug flags set. The reason why there is nothing printed is because driver did not provide any information about what exactly went wrong. That we can't fix.

Ilia (inferrna) added a comment.EditedJul 14 2016, 9:59 AM

Actual compilation error is reported by OpenCL and always printed to stderr, regardless of debug flags set. The reason why there is nothing printed is because driver did not provide any information about what exactly went wrong. That we can't fix.

In pure C I can do such a blackmagic trick:

// Create the compute program from the source buffer
//
program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
if (!program)
{
    printf("Error: Failed to create compute program!\n");
    return EXIT_FAILURE;
}

// Build the program executable
//
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS)
{
    size_t len;
    char buffer[2048];

    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
    printf("<OpenCL compiler error message>\n%s\n</OpenCL compiler error message>\n", buffer);
    exit(1);
}

Gives me output:

Error: Failed to build program executable!
<OpenCL compiler error message>
"/tmp/OCL27420T1.cl", line 2: error: identifier "class" is undefined
  class Test                 
  ^

"/tmp/OCL27420T1.cl", line 3: error: expected a ";"
  {                  
  ^

At end of source: warning: parsing restarts here after previous syntax error

2 errors detected in the compilation of "/tmp/OCL27420T1.cl".
Frontend phase failed compilation.

</OpenCL compiler error message>

I very hope you now also can too ) Little hint: /tmp/OCL27420T1.cl - is created by driver and will be deleted after compilation. To provide kernel source for some curious users, you need to save kernel source at another path.

This is almost the same as we do, with difference in the following:

First we call clGetProgramBuildInfo with param_value of NULL in order to get real size of buffer we need to retrieve the error. This is needed because error message might be way longer than buffer allocated on stack. This is also a legit situation according to OpenCL specs to get parameter size.

Then we allocate buffer of the returned param_value_size_ret.

After that we call clGetProgramBuildInfo() with proper buffer size and param_value to get actual error message.

See [1] for some more details in actual code :)

Can you switch your test application to do similar thing and see if it gives proper
error message to you?

[1] https://developer.blender.org/diffusion/B/browse/master/intern/cycles/device/device_opencl.cpp;b00bc3cbc17b938a90f996b6a5206a876bf898a9$862

Changed to:

// Build the program executable
//
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS)
{
    size_t len;

    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
    printf("Got 1st len = %lu\n", len);
    char* buffer = (char*) malloc(len);
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, len, buffer, &len);
    printf("Got 2nd len = %lu\n", len);
    printf("</OpenCL compiler error message>\n%s\n</OpenCL compiler error message>\n", buffer);
    exit(1);
}

Gives:

Error: Failed to build program executable!
Got 1st len = 359
Got 2nd len = 359
</OpenCL compiler error message>
"/tmp/OCL28625T1.cl", line 2: error: identifier "class" is undefined
  class Test                 
  ^

"/tmp/OCL28625T1.cl", line 3: error: expected a ";"
  {                  
  ^

At end of source: warning: parsing restarts here after previous syntax error

2 errors detected in the compilation of "/tmp/OCL28625T1.cl".
Frontend phase failed compilation.

</OpenCL compiler error message>

It seems to be a proper way. Currently I'm on 2.77-a-1459879952-0thomas~xenial0 from this ppa https://launchpad.net/~thomas-schiex/+archive/ubuntu/blender?field.series_filter=xenial (compiled 2016-04-08) Is this code already there?

The code was there since the beginning of device_opencl.cpp, which makes it weird why Cycles does not print same error message. Would need to have a closer look.

At the meantime, please attach system-info.txt generated by Help -> Save System Info.

Sergey Sharybin (sergey) lowered the priority of this task from 90 to Normal.

Very strange, but after switching render resolution to 640x480 I 've got no more error. And even switching back to 1920x1080 too.. -) Seems to be possible an strange driver bug and error was happened without any message(?)
Nevertheless it would be nice to have *.cl source file if something went wrong.

Rather than providing access to the temporary combined .cl source file, I would try to use #line preprocessor directives to point to the original .cl files.

Rather than providing access to the temporary combined .cl source file, I would try to use #line preprocessor directives to point to the original .cl files.

Instead I'd prefer to see the cooked source. Having this I can try to compile it another way (with pyopencl for example) on another platform with another options. Or attach it to some bugreport (for example final source may depend on some render parameters and it will be more easy - to provide just cl source rather than all cycles parameters and option )

First of all, OpenCL source does not depend on any of settings. It is compilation parameters which depends on that.

Second of all, you can set CYCLES_OPENCL_DEBUG enironment variable to enable some extra OpenCL debugging things which includes dump of exact source being compiled. Same think you can access by setting Debug Value to 256 and enable Debug in OpenCL section of Debug panel within Render buttons.

Third of all, i've committed tweaks to code which makes it so --debug-cycles and --debug-libmv are included into --debug-all. I've committed change which prints which kernel is currently being compiled.

Forth of all, i've tested OpenCL kernel on NVidia cards here and they're happily compiled the code. Additionally, after adding some code which can not be compiled i've got proper error message printed to the console.

And finally, i can see you've got two OpenCL devices. Are you sure you're using same device in test application? Could be so CPU backend reports error properly, and GPU backend does not.

At first it seemed to be a kind of debug magic.

- without "--debug"

- with "--debug"

But then

- finally "--debug-memory" solo
"fully guarded memory allocator." - solved my problem (as for now, and if no more hidden surprise errors exists)
You may add "--debug-memory" to possible solution for cases like this - when no compiler errors produced.

And finally, i can see you've got two OpenCL devices. Are you sure you're using same device in test application? Could be so CPU backend reports error properly, and GPU backend does not.

Yes, I tried both devices.

I steal amazing err_code function from here https://github.com/HandsOnOpenCL/Exercises-Solutions/blob/master/Exercises/C_common/err_code.h
And with magic power of it now I'm able to print error code right into stderr. Below is what I got:

Device init success
Compiling base_kernel OpenCL kernel ...
Build flags: -D__NODES_MAX_GROUP__=3 -D__NODES_FEATURES__=0 -D__MAX_CLOSURE__=1 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__
OpenCL build failed: errors in console
Got build error: 'CL_INVALID_BUILD_OPTIONS'
File saved to '/tmp/failed_program.cl'

Saving file to '/tmp/failed_program.cl' is my own invention - nothing with it, successful compiled using pyopencl with flags provided.

That is _REALLY_ weird that any of the debug flags makes difference here. Especially the --debug-memory. Does it mean we've got some major memory issues (writing to outsides of the arrays)?

In any case, that's a good point that it's handy to dump error code to the console as well. We have clewErrorString() for that tho.

Since i can not replicate this issue, do you mind updating your code to use clewErrorString() and provide a patch? :)

P.S. Build options issue we'll also need to look into, but those i'll need to have a closer look. Maybe it's just a known double-space issue or so :S

diff --git a/intern/cycles/device/device_opencl.cpp b/intern/cycles/device/device_opencl.cpp
index 50490f3..16e4de8 100644
--- a/intern/cycles/device/device_opencl.cpp
+++ b/intern/cycles/device/device_opencl.cpp
@@ -19,6 +19,8 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <fstream>
+#include <iostream>
 
 #include "clew.h"
 
@@ -875,6 +877,8 @@ public:
 
 		if(ciErr != CL_SUCCESS) {
 			opencl_error("OpenCL build failed: errors in console");
+            fprintf(stderr, "Got build error: '%s'\n", clewErrorString(ciErr));
+            fprintf(stderr, "Really used build options: '%s'\n", build_options.c_str());
 			return false;
 		}
 
@@ -917,10 +921,15 @@ public:
 		/* TODO(sergey): Report which kernel is being compiled
 		 * as well (megakernel or which of split kernels etc..).
 		 */
-		printf("Build flags: %s\n", custom_kernel_build_options.c_str());
 
-		if(!build_kernel(kernel_program, custom_kernel_build_options, debug_src))
+		if(!build_kernel(kernel_program, custom_kernel_build_options, debug_src)) {
+            string filename ("/tmp/" + kernel_name + ".cl");
+            std::ofstream out(filename.c_str());
+            out << source_str;
+            out.close();
+            fprintf(stderr, "Source saved to '%s'\n", filename.c_str());
 			return false;
+        }
 
 		printf("Kernel compilation finished in %.2lfs.\n", time_dt() - starttime);

What done:

  1. Move printing build options inside build_kernel function because it modifies build_options variable.
  2. Print build error with clewErrorString function
  3. Saving failed program source.

You may remove saving stuff or change it to be working on non-having "/tmp/" platforms.

- I also attached program source - it's very large (about 2.9 MB) and full of duplicates. For example - are we really need 6 copies of

ccl_device_inline int __float_as_int(float f)
{
    union { int i; float f; } u;
    u.f = f;
    return u.i;
}

and other functions?

That is _REALLY_ weird that any of the debug flags makes difference here. Especially the --debug-memory. Does it mean we've got some major memory issues (writing to outsides of the arrays)?

It is very possible that I was wrong about it - just now I've got successful render without this option and unsuccessful with.

I've committed chaneg which prints human-readable description of error code when compilation fails.

For example - are we really need 6 copies of and other functions?

Those copies are a result if inline-ing header files since OpenCL had difficulties using #include statements. We don't do any real pre-processing when expanding #include statements and leaving this to the compiler, which then expands #ifdef blocks and strips unneeded parts of the code.

All this is quite harmless actually.

It is very possible that I was wrong about it - just now I've got successful render without this option and unsuccessful with.

This is getting really weird. For me compilation always successful. Are the latest builds from builder.blender.org still flackey for you?

Sergey Sharybin (sergey) changed the task status from Unknown Status to Unknown Status.Sep 6 2016, 2:20 PM

Ok, so few things.

First of all, didn't see reply in more that 5 days. So due to policy of the tracker archiving this task since it lacks feedback from the reporter.

Second of all, the original issue with lack of extra error print in OpenCL has been already fixed in git.

Third of all, if there are some other issues here to be handled please make a separate report. Will make this much easier to keep track on.