Page MenuHome

Cycles: Split kernel - sort shaders

Authored by Hristo Gueorguiev (nirved) on Apr 4 2017, 12:30 PM.



This patch reduces thread divergence in kernel_shaders_sort.

Rays are sorted in packs of 2048 according to sd->shader.

The execution time of kernel_shader_sort is halved, giving performance boost in Clasroom (~30%) and Pabellon Barcelone (~8%).

Diff Detail

rB Blender

Event Timeline

Nice speedups!

Needs few style issues fixed and maybe some renaming, see inline comments. Also instead of


can we have this?


Can't seem to get it working with CUDA for some reason, kernels fail to run, maybe something to do with local memory?

22 ↗(On Diff #8536)

This needs to be char *kg.

30 ↗(On Diff #8536)

Should have brackets around the return.

20 ↗(On Diff #8536)

Cant we use ccl_local_size?

27 ↗(On Diff #8536)

This should probably be moved to kernel_types.h. Also I don't really like the use of an acronym here, can we just call this SORTING_SIZE or similar?

47 ↗(On Diff #8536)

We can't use ccl_local here, these variables will have to be defined in the __kernel functions and passed in via pointer, see how local variables are done for other kernels.

50 ↗(On Diff #8536)

Style issue, put spaces around operators like uint i = 0; Also in other places.

81 ↗(On Diff #8536)

Should be ccl_barrier and CCL_LOCAL_MEM_FENCE

Hristo Gueorguiev (nirved) marked 7 inline comments as done.Apr 12 2017, 10:45 AM
Hristo Gueorguiev (nirved) added inline comments.
20 ↗(On Diff #8536)

Has to be a constant in define, otherwise compiler doesn't unroll loops, which lowers performance.

Hristo Gueorguiev (nirved) marked an inline comment as done.
  • Renamed KSES_* defines to more meaningful ones
  • Moved SHADER_EVAL_SORT_SIZE to kernel_types.h
  • Moved local variables to __kernel function
  • Fixed style issues

Only minor stuff, don't want to be too picky. I think it will be fine to commit, but maybe would be nice if @Sergey Sharybin (sergey) or someone else could test CUDA split first (I would test it but I've been having issues with CUDA lately).


Maybe define a struct to hold the local variables like is done for QueueEnqueueLocals so we have less code duplicated?


No space between if and (, also brackets are nice even for one line. A few other places as well.


Would still rather see these in kernel_types, but not a huge deal.


Still a few spacing issues around operators here and some other places.

This revision is now accepted and ready to land.Apr 18 2017, 10:58 AM
Hristo Gueorguiev (nirved) marked 4 inline comments as done.

Updated all minor stuff.

Rebased on master. No shader sorting for CUDA split kernel.

This revision was automatically updated to reflect the committed changes.