Cycles: Split kernel - sort shaders
ClosedPublic

Authored by Hristo Gueorguiev (nirved) on Apr 4 2017, 12:30 PM.

Details

Summary

This patch reduces thread divergence in kernel_shaders_sort.

Rays are sorted in packs of 2048 according to sd->shader.

The execution time of kernel_shader_sort is halved, giving performance boost in Clasroom (~30%) and Pabellon Barcelone (~8%).

Diff Detail

Repository
rB Blender

Nice speedups!

Needs few style issues fixed and maybe some renaming, see inline comments. Also instead of

shader_eval_1
shader_eval_sort
shader_eval_2

can we have this?

shader_setup
shader_sort
shader_eval

Can't seem to get it working with CUDA for some reason, kernels fail to run, maybe something to do with local memory?

intern/cycles/kernel/kernels/opencl/kernel_shader_eval_1.cl
22 ↗(On Diff #8536)

This needs to be char *kg.

intern/cycles/kernel/split/kernel_shader_eval_2.h
30 ↗(On Diff #8536)

Should have brackets around the return.

intern/cycles/kernel/split/kernel_shader_eval_sort.h
20 ↗(On Diff #8536)

Cant we use ccl_local_size?

27 ↗(On Diff #8536)

This should probably be moved to kernel_types.h. Also I don't really like the use of an acronym here, can we just call this SORTING_SIZE or similar?

47 ↗(On Diff #8536)

We can't use ccl_local here, these variables will have to be defined in the __kernel functions and passed in via pointer, see how local variables are done for other kernels.

50 ↗(On Diff #8536)

Style issue, put spaces around operators like uint i = 0; Also in other places.

81 ↗(On Diff #8536)

Should be ccl_barrier and CCL_LOCAL_MEM_FENCE

Hristo Gueorguiev (nirved) marked 7 inline comments as done.Apr 12 2017, 10:45 AM
Hristo Gueorguiev (nirved) added inline comments.
intern/cycles/kernel/split/kernel_shader_eval_sort.h
20 ↗(On Diff #8536)

Has to be a constant in define, otherwise compiler doesn't unroll loops, which lowers performance.

Hristo Gueorguiev (nirved) marked an inline comment as done.
  • Renamed KSES_* defines to more meaningful ones
  • Moved SHADER_EVAL_SORT_SIZE to kernel_types.h
  • Moved local variables to __kernel function
  • Fixed style issues

Only minor stuff, don't want to be too picky. I think it will be fine to commit, but maybe would be nice if @Sergey Sharybin (sergey) or someone else could test CUDA split first (I would test it but I've been having issues with CUDA lately).

intern/cycles/kernel/kernels/cpu/kernel_cpu_impl.h
187

Maybe define a struct to hold the local variables like is done for QueueEnqueueLocals so we have less code duplicated?

intern/cycles/kernel/split/kernel_shader_eval.h
30

No space between if and (, also brackets are nice even for one line. A few other places as well.

intern/cycles/kernel/split/kernel_shader_sort.h
20

Would still rather see these in kernel_types, but not a huge deal.

71

Still a few spacing issues around operators here and some other places.

This revision is now accepted and ready to land.Apr 18 2017, 10:58 AM
Hristo Gueorguiev (nirved) marked 4 inline comments as done.

Updated all minor stuff.

Rebased on master. No shader sorting for CUDA split kernel.

This revision was automatically updated to reflect the committed changes.