Improved triangle sampling for mesh lights

Authored by Stefan Werner (swerner) on Jun 27 2017, 11:19 PM.



This implements Arvo's "Stratified sampling of spherical triangles". Similar to how we sample rectangular area lights, this is sampling triangles over their solid angle. It does significantly improve sampling close to the triangle, but doesn't do much for more distant triangles. So I added a simple heuristic to switch between the two methods. Unfortunately, I expect this to add render time in any case, even when it does not make any difference whatsoever. It'll take some benchmarking with various scenes and hardware to estimate how severe the impact is and if it is worth the change.

Diff Detail

rB Blender

I just noticed a bug in this myself - it doesn't work properly with instanced triangles, as it does not apply the transform. Will try and fix this.

I added support for instancing and object motion blur. Deformation motion blur is still missing, and I don't think it was supported before either.

Deformation motion blur is now supported too. Makes quite a difference, as before deformation motion blurred objects were not sampled as light sources.

Before/after video:

Brecht Van Lommel (brecht) requested changes to this revision.Jun 29 2017, 10:38 AM

This is great, just like quad solid angle sampling the improvement will be especially noticeable inside volumes.


You can pass NULL to avoid computing itfm.


This might not be a great estimate for shading points near the middle of long thin triangles. The distance from the point to the plane would have some false positives, but still avoids most of the cost I expect.

distance_to_plane = abs(dot(N, A)) / len(N)

triangle area computation could be optimized since it's 0.5f * len(N), and we already computed len_squared(N).


Spaces after =.


I think this formula is wrong. There's two pdf's here that need to be multiplied together. One for sampling a point in the triangle:

pdf_triangle = t*t/(cos_pi * area_post)

And the other for picking a triangle in light_distribution_sample, which is the same as the solid angle case.

pdf_distribution = area_pre *  kernel_data.integrator.pdf_triangles

triangle_light_pdf_area assumes area_pre and area_post cancel out, which they don't for motion blur. So I think the code here should just be this:

pdf *= area_pre / area_post;

Use safe_sqrtf().


We can immediately return here.


I think it would be simpler and faster to share the computation of ls->Ng and ls->shader with the solid angle case, moving that to the start of this function. ls->P can always be computed like the has_motion case.

This revision now requires changes to proceed.Jun 29 2017, 10:38 AM

A new update, taking Brecht's comments into account and a few other improvements.

I'm not sure if there is a perfect heuristic to switch between the two sampling strategies - in certain cases, I could make the line where the switch happens visible as a sudden change in noise - still, at an overall better quality than before the patch.

Stefan Werner (swerner) marked 8 inline comments as done.Jun 30 2017, 11:48 PM

Looks good to me now.

I'm not sure if there is a perfect heuristic to switch between the two sampling strategies - in certain cases, I could make the line where the switch happens visible as a sudden change in noise - still, at an overall better quality than before the patch.

It's possible to make a transition region where it chooses between the two sampling methods, but I'm not sure if that's worth it in practice.

This one's not quite ready yet. I'm seeing odd artifacts when using this on CUDA hardware that doesn't show up when rendering on the CPU. Can't say yet what's causing it. This needs some investigation.

Changing the sample and pdf calls to forceinline made things work on CUDA.

One more round of improvements. A few optimisations, and a change to the heuristic for switching between sampling strategies. Now it looks at the triangle's edge lengths instead of its area, that should hopefully help long and thin triangles.

About the cost of the improved sampling:
The BMW benchmark scene, which is lit entirely by a large mesh light, goes on my machine from 11m 6s to 11m 50s. Since those are large mesh lights, pretty much every pixel in the scene is using the more expensive sampling path. The benefits of the better sampling are however not visible, since most of the noise in that scene comes from glossy reflections, not the area light.

Most of the performance penalty appears to come from the trig functions. I did experiment with using SSE to vectorise the calls to normalise(), but that didn't do much other than make the code less readable. If anyone else has suggestions for improvements, I'm all ears.

I don't immediately have a good suggestion to optimize this. It's possible in principled to use SIMD for the cross product, normalize and fast_acos (which is likely the slowest part), but that's not so simple to implement. To me performance seems acceptable.

This revision is now accepted and ready to land.Jul 26 2017, 5:01 PM

To be clear, this should be committed after 2.79 is released since we are in bugfix only mode.

This revision was automatically updated to reflect the committed changes.