Page MenuHome

Cycles: Add optional Blue-Noise Dithered Sobol Sampling
Needs RevisionPublic

Authored by Lukas Stockner (lukasstockner97) on Aug 10 2016, 6:01 PM.

Details

Summary

This patch implements precomputed dithered sampling as described in the paper "Blue-noise Dithered Sampling".

Mainly for low sample counts, such as the first few seconds of viewport rendering, the improvement in quality is substantial - especially the brightness flickering happening with regular Sobol and HDRIs is pretty much gone.

For now, the dithering matrix included in sobol.cpp isn't perfect yet - I'm currently running a 400 million iterations simulated annealing process, but that will still take about a day to finish. I'll update the patch as soon as it's here.
Because of the temporary matrix, you can still see some tiling happening at low sample counts, but with a good dithering matrix that shouldn't be as visible.

Diff Detail

Repository
rB Blender
Branch
bluenoise
Build Status
Buildable 98
Build 98: arc lint + arc unit

Event Timeline

Lukas Stockner (lukasstockner97) retitled this revision from to Cycles: Add optional Blue-Noise Dithered Sobol Sampling.
Thomas Dinges (dingto) requested changes to this revision.Aug 10 2016, 7:21 PM

Nice work, I can see some visibility improvements in low sample counts. Some minor things inline.

intern/cycles/kernel/kernel_textures.h
180

Reference needs to be removed from svm_image.h as well.

intern/cycles/kernel/kernel_types.h
1116

New variable here changes padding, remove pad1 to ensure alignment.

This revision now requires changes to proceed.Aug 10 2016, 7:21 PM
Lukas Stockner (lukasstockner97) edited edge metadata.

Updated svm_image.h with the correct texture amount.
As for the padding, turns out that it was wrong in master to begin with. I fixed it now.

  • Missing tex_free for __sobol_dither.
  • Add some comments about what all those bitwise operations do?
  • Put the simulated annealing code somewhere in the repo?
Lukas Stockner (lukasstockner97) edited edge metadata.
  • Added device_free call
  • Made bitwise operations in kernel_random.h a bit clearer
  • Added simulated annealing tool in intern/cycles/app/ (not in CMake, just as a single file)
  • Updated matrix with better one, still not the "final" one

The updated simulated annealing tool now also comes with some approximate math and SSE4.1 code, which makes it about 5 times faster.

I can't review the sampling algorithms because I'm too familiar with the original implementation, so I don't have much more to add here.

But before merging this kind of change we should definitely see test renders of various scenes to verify it's all working as intended.

Hi, would like to test but patch does not work with master da77d987.

Opensuse Leap 42.1 x86_64
Intel i5 3570K
RAM 16 GB
GTX 760 4 GB /Display card
GTX 670 2 GB
Driver 367.35
gcc (SUSE Linux) 4.8.5

/blender_build/blender> patch -p1 < D2149.diff
patching file intern/cycles/app/cycles_dithering.cpp
patching file intern/cycles/blender/addon/properties.py
patching file intern/cycles/blender/blender_sync.cpp
patching file intern/cycles/kernel/kernel_bake.h
patching file intern/cycles/kernel/kernel_path_branched.h
patching file intern/cycles/kernel/kernel_path_surface.h
patching file intern/cycles/kernel/kernel_path_volume.h
patching file intern/cycles/kernel/kernel_random.h
patching file intern/cycles/kernel/kernel_textures.h
patching file intern/cycles/kernel/kernel_types.h
Hunk #1 succeeded at 1117 with fuzz 2 (offset 4 lines).
patching file intern/cycles/kernel/kernel_volume.h
patching file intern/cycles/kernel/svm/svm_image.h
patching file intern/cycles/render/integrator.h
patching file intern/cycles/render/integrator.cpp
patching file intern/cycles/render/scene.h
patching file intern/cycles/render/sobol.h
patching file intern/cycles/render/sobol.cpp
patching file intern/cycles/util/util_texture.h
Hunk #1 FAILED at 37.
1 out of 1 hunk FAILED -- saving rejects to file intern/cycles/util/util_texture.h.rej

Mib

As util_texture.h is the only file that doesn't patch, you can ignore it. Just compile it and it should run. The changes there are just about texture limits for Fermi.
Alternatively, install archanist and use the command "arc patch D2149" to apply the patch.

First test I made with dithered Sobol show some slight improvements in very simple scenes (some cubes) but not in benchmark files at low samples (at least not visible to the naked eye at 8, 16 nor 64 samples). The noise patterns are clearly different, but none seems to be better than the other one.
By the way, setting dithered sobol with CPU device and switching to GPU works well with OpenCL, so the choice can be made available for GPU/OpenCL.

Sergey Sharybin (sergey) requested changes to this revision.Apr 24 2017, 9:42 PM

The patch clearly needs an update against latest master.

When that' done i do have the following issues.

  • There are correlation artifacts in the following file which looks like

Not sure whether it's a merge conflict i resolved wrongly or there is a real problem with the particular dither.

Volume part can be solved by restoring the re-hashing in volume code, however this doesn't seem to be a good/proper solution: distribution of path_rng_1D should be good enough ideally.

You also mentioned some work being done for getting better dither matrix, any luck with that?

This revision now requires changes to proceed.Apr 24 2017, 9:42 PM

Thanks for the link, I just tried his tool and it seems to work reasonably well.
However, from what I can see so far, it doesn't perform as well as intern/cycles/app/cycles_dithering.cpp, the tool that's included in this patch.
The two main problems I see with it are:

  • It just runs straight into the first local minimum because it doesn't do simulated annealing - it just rejects every swap that increases the total energy
  • It recalculates the energy of the entire system after every swap, which is extremely wasteful - due to the local nature of the energy term used, only a handful of terms have to be recalculated. Doing so helps reducing the computational complexity a lot (my tool used to perform billions of swaps iirc, while his tool currently takes 30min to do 4096 iterations) as well as reducing numerical issues.