Cycles: Speed up split kernel builds by splitting svm_eval_nodes out of direct_emission functions
Needs RevisionPublic

Authored by Mai Lavelle (maiself) on Nov 15 2017, 10:09 AM.

Details

Reviewers
Brecht Van Lommel (brecht)
Group Reviewers
Cycles
Summary

This removes the indirect call to svm_eval_nodes, which is a significantly taxing function for the compiler, from the kernel_direct_lighting kernel, and speeds up kernel build times by 4-5 seconds. Render time on AMD GPU is slower by ~1%, CPU render times seem unaffected.

Check the split-kernel-faster-building branch for individual commits.

The branch also has kernel_indirect_background split in the same manner, with a 2-3 second improvement in build times on top of this patch. It should be possible to continue with this for a few other kernels, but things become problematic for kernels containing functions that call svm_eval_nodes in a loop. I'm not quite sure what we could do about that just yet.

Diff Detail

Brecht Van Lommel (brecht) requested changes to this revision.Nov 16 2017, 6:29 PM

I don't have any easy solutions for the loop cases.

  • For lamp MIS I would like to see lamps as geometry in the BVH, and basically treat them the same as triangle emission. Mainly we miss a sphere geometry primitive for that.
  • For the shadow catcher we should stop using sample all lights by moving the shadowed/unshadowed division of out the path tracing kernel, and doing it on the final buffers.
  • Branched path tracing may go away eventually if we have better direct light sampling.
  • For transparent shadows not sure what to do. Merging shadow_blocked_transparent_stepped_loop and shadow_blocked_transparent_all_loop may help, but maybe the compiler is already smart enough to not inline the shader evaluation function.

The background stuff seems good. We could go even further there to treat the background as a regular surface, merging kernel_indirect_background into kernel_shader_setup, etc.

intern/cycles/kernel/kernel_types.h
1044

I don't think we need to store path_flag and max_closure. If we add a few extra intents like SHADER_EVAL_INTENT_SHADOW and SHADER_EVAL_INTENT_EMISSION we can figure out the right values in shader_eval.

To me it seems like this struct should be used for the split kernel only. shader_eval can have only an intent parameter. A new shader_eval_get_emission(sd, intent) function could to get the result, which the megakernel could use directly and the split kernel could store it in the task.

This revision now requires changes to proceed.Nov 16 2017, 6:29 PM

Updated to current branch state:

  • ShaderEvalTask is split kernel only again.
  • ShaderEvalIntent is passed to shader eval functions instead of path_flag and max_closures.
  • Minor cleanup
Mai Lavelle (maiself) marked an inline comment as done.Nov 26 2017, 10:28 AM

OSL can't build with this patch on windows.

Note that the patch make arcanist to crash the console on windows and there is one rejection on kernel_path.h.

kernel build time with this patch on the barbershop scene goes from 98sec (split kernel only) to 89sec. Using the branch, it even goes down to 69sec. Together with kernel_base, it goes from nearly 2 minutes to a bit less than 1 and a half minutes. Still pretty high, but already a good speedup.

Brecht Van Lommel (brecht) requested changes to this revision.Nov 28 2017, 5:52 AM

Code structure looks good to me now, but the branch is crashing various ctests here (Linux + Clang), and rendering some incorrectly.

intern/cycles/kernel/kernel_emission.h
106

This function doesn't seem to be used anywhere, can be removed?

intern/cycles/kernel/kernel_shader.h
1014

Missing a } here.

intern/cycles/kernel/split/kernel_split_common.h
57

{ on new line.

This revision now requires changes to proceed.Nov 28 2017, 5:52 AM

with latest version of the branch, barbershop renders correctly now, but the whole PC hangs after the third tile and only a hard reset is possible (windows 7, driver 17.11.4 on Vega64). It would be good to report the bug to AMD.