The algorithm averages normals from nearby surfaces. It uses the same
sampling strategy as BSSRDFs, casting rays along the normal and two
orthogonal axes, and combining the samples with MIS. Needs some code
cleanup still to untangle it from the BSSRDF code.
The main concern here is that we are introducing raytracing inside
shader evaluation, which could be quite bad for GPU performance and
stack memory usage. A similar issue exists if we want to add an AO
node that can output values instead of closures.
For CUDA with a GTX 1080, stack memory usage goes from 23408 to 24560
bytes, and render time on benchmark scenes seems unchanged. For AMD
OpenCL I did not test yet. Adaptive compilation there means it does not
affect scenes not using the bevel shader, but it's still a concern if
one small object with a bevel shader slows down an entire scene.
Moving the ray tracing outside of the shader evaluation is difficult,
that would make linking parameters or having multiple nodes with
different parameters impossible, since splitting up shader evaluation
into multiple kernels seems impractical.