Page MenuHome

[Cycles/MSVC] compile-time regression.

Authored by Ray molenkamp (LazyDodo) on May 20 2016, 3:52 AM.



4a4f043bc4235c046d2b58e00f2b80665ded11bf introduced a compile-time increase of cycles kernel.cpp with msvc from 1:07.86 on this box to 00:03:41.00
3c85e1ca1a916fe2ded9ab508f4cd55a2ee22549 increased this to 00:08:37.29

at ~9 minutes per kernel, cycles_kernel now takes about 50 minutes to compile.

this patch brings this back down to 00:01:12.92 (for kernel.cpp)

Given the method is marked ccl_device_noinline i expected a perf hit, but to my surprise I got about a 1% speedup out of it on pavillon_barcelone_cpu.

I tested the patch with the cpu render, it broke compiling with osl and i haven't tested gpu rendering at all. Posting this incomplete patch cause i'm out of time for the night hoping dingto will pickup where i left of tomorrow morning, if not ... no big deal, i'll finish the patch over the weekend.

Diff Detail

rB Blender

Event Timeline

Ray molenkamp (LazyDodo) retitled this revision from to [Cycles/MSVC] compile-time regression..
Ray molenkamp (LazyDodo) updated this object.
Ray molenkamp (LazyDodo) set the repository for this revision to rB Blender.

Just commandeering this for a fix for the OSL problem. Anybody interested: Fell free to commandeer back :)

Well, the problem with the current approach is that ccl_device_noinline is defined to be static, which is why OSL complains when compiling - the _impl functions are local to the kernel_X.cpp files, they can't be linked into the OSL parts.
Now, the obvious solution would be to move them to kernel.cpp and not make them static, but that would disable any SSE/AVX optimization for the texture interpolation code.
So, the only real option is to have these _impl functions as static in some kernel header, where they get included in every compilation unit, both kernels and OSL code. However, they can't be in kernel_compat_cpu.h as that one's included before KernelGlobals is defined. So, for now, I just put it into kernel_globals.h, since KernelGlobals gets defined there and it's always included before any use of the functions. The downside there is that a few files (osl_shader.cpp, osl_closures.cpp and device_cpu.cpp) now complain about defined but unused functions...

Maybe make a new kernel_cpu_image.h in kernels/cpu/, and include it from kernel_cpu_impl.h?

Thomas Dinges (dingto) requested changes to this revision.May 20 2016, 11:49 AM
Thomas Dinges (dingto) edited edge metadata.
Thomas Dinges (dingto) added inline comments.
149 ↗(On Diff #6714)

Would prefer else if, then else.

168 ↗(On Diff #6714)

interpolation is a int, not float.

This revision now requires changes to proceed.May 20 2016, 11:49 AM
Ray molenkamp (LazyDodo) updated this revision to Diff 6721.
Ray molenkamp (LazyDodo) edited edge metadata.
Ray molenkamp (LazyDodo) removed rB Blender as the repository for this revision.

Updated with feedback.

Ray molenkamp (LazyDodo) set the repository for this revision to rB Blender.

Looks ok apart from code style.


We don't have a gap between if and the brace. Correct:


Put return into a new line here too.

Updated with feedback.

Thomas Dinges (dingto) edited edge metadata.
This revision is now accepted and ready to land.May 20 2016, 4:26 PM
This revision was automatically updated to reflect the committed changes.