Page MenuHome

Adaptive Sampling for Cycles.
Needs ReviewPublic

Authored by Stefan Werner (swerner) on Apr 15 2019, 1:49 PM.
Tokens
"100" token, awarded by Schamph."Love" token, awarded by Dangry."Burninate" token, awarded by MetinSeven."Love" token, awarded by lopoIsaac."Love" token, awarded by Dragon.Studio."Love" token, awarded by Shimoon."Love" token, awarded by intracube."Love" token, awarded by Jk."Like" token, awarded by Maged_afra."Love" token, awarded by leonumerique."Love" token, awarded by ponomarovmax."Love" token, awarded by 1seby."Love" token, awarded by monio."Party Time" token, awarded by sam_vh."Love" token, awarded by Oskar."Love" token, awarded by mistajuliax."Love" token, awarded by Brandon777."Love" token, awarded by 245."Love" token, awarded by lordodin."Love" token, awarded by andruxa696."Love" token, awarded by amonpaike."Burninate" token, awarded by cerebral_malfunction."Love" token, awarded by deadpin."Love" token, awarded by Ristovski."Love" token, awarded by bnzs."Love" token, awarded by franMarz."Burninate" token, awarded by hitrpr."Love" token, awarded by Treki26."Love" token, awarded by tuqueque."Love" token, awarded by Sam200."Like" token, awarded by Unreleased75."Like" token, awarded by evantryan."Yellow Medal" token, awarded by MarcelQ."Yellow Medal" token, awarded by symstract."Love" token, awarded by Forshu."Love" token, awarded by zanqdo."Love" token, awarded by jonathanl."Hungry Hippo" token, awarded by Tvaroog."Love" token, awarded by Alex108."Yellow Medal" token, awarded by duarteframos."Love" token, awarded by LucaRood."Love" token, awarded by thornydre."Burninate" token, awarded by kioku."Mountain of Wealth" token, awarded by marcog."Love" token, awarded by ogierm."Love" token, awarded by mandeep."Mountain of Wealth" token, awarded by ace_dragon."Love" token, awarded by rbx775."Like" token, awarded by helloidonthaveanyideaformyusername."The World Burns" token, awarded by filibis."Like" token, awarded by jtheninja."Love" token, awarded by julperado."Love" token, awarded by SteffenD."Love" token, awarded by irfan."Like" token, awarded by TheCharacterhero."Like" token, awarded by YAFU."Like" token, awarded by MJunk."Love" token, awarded by wo262."Party Time" token, awarded by brecht.

Details

Summary

This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

In addition, it implements the sampling pattern described in
"Progressive Multi-Jittered Sample Sequences"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run peridoically over the tile buffer is used for that purpose.

After a tile is finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Diff Detail

Repository
rB Blender
Branch
cycles_adaptive_sampling
Build Status
Buildable 6232
Build 6232: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Does noise detection is done on linear or is useing the color managment we have choosen.. because if it works at linear we are getting huge performance loss.. it should use same LUT/OCIO that is set in color managment... or the best would be if u could choose OCIO just for adaptive. + gamma.

Just chiming in to say that after the big OIDN patch, getting this one in master ASAP will make Cycles render performance compared to 2.80 feel like night and day. I can verify that idea from (OIDN) tests I've done in today's midday buildbot build for Win64.

I hope to see this for version 2.81.

I gave this patch a try last night because I was curious if the tile border filtering grid pattern would interfere with OIDN at all.

Good news: It does not seem to! Even in areas where there's a pretty stark sampling differences between tiles, OIDN didn't seem to mind. I was worried it might pick up some underlying pattern.
Less good news: I came across 1, subjective, instance where I think the current sobol pattern yields better OIDN results on a given image when sampling count differences are removed from equation. OIDN may just not be trained well enough for PMJ? Take this as a myth until someone else spots a decent difference :)
General news: Patch needs rebasing. I had to fix things up locally to address some conflicts and padding differences.

This should be added in conjunction with Scramble Distance and Blue Dithering, without those two the performance gain with this adaptive sampling is negligible IMHO in conjunction it is very very powerful.

Oskar (Oskar) added a subscriber: Oskar (Oskar).
Seby (1seby) added a subscriber: Seby (1seby).

@Brecht Van Lommel (brecht) Did you get a chance to take a look at this? Should I commit this for 2.82?

Brecht Van Lommel (brecht) requested changes to this revision.Dec 10 2019, 8:07 PM

It would be nice to get this in 2.82 still, even if there are some limitations still. Better not let perfect be the enemy of the good.

If we want to be conservative, we could make this an experimental feature for GPU rendering, though not sure it will make much practical difference since users just enable it then anyway. The release notes should at least clearly explain limitations regarding mixed device rendering.

This also will not work with rendering different sample ranges and then combining, which should also be mentioned as a limitation. That could be made to work when the aux buffer is saved in the EXR and then available as input for another rendering adding more samples, but that has deeper implications for render farms and is something to solve later if at all.

Optix support is missing, probably that is a matter of copy-pasting parts from CUDA (we would like to unify these implementations more in the future, but should be ok for now). I would not consider that a blocker for the December 12 deadline for bcon1.

The comments I made are relatively minor code structuring things, hopefully not difficult to make. You'll have to take full responsibility for handling any breaking or bug reports coming from this though, as I'll be away with little internet connection between December 13-25.

TODO:

  • Restarting sampling on a previously stopped pixel will make it pick up at a later point in the sample sequence. It should continue sampling as if it was never stopped instead.
  • Pixels that were stopped and then get restarted later by neighboring pixels may end up not getting the full sample count. This is because the main loop over a tile is still for(1...num_samples) instead of a > do...while(samples_needed). Adding this on the CPU should not be hard, it will be trickier for GPU kernels.

These should be mentioned as comments in the code somewhere.

intern/cycles/blender/addon/properties.py
366

This is missing a description about what it is, and what a lower or higher value accomplishes.

373

This description could be extended to describe in which cases you might want to increase this, which problem it solves.

374

soft_max could be 4096 here, but hard max should be the same as samples I think ((1 << 24)).

intern/cycles/blender/blender_sync.cpp
665

Ideally adding this pass should not be done in the Blender integration, but somehow automatically when adaptive sampling is enabled in the integrator.

It could be done similar to the denoising passes that do not need to be output I think.

intern/cycles/device/device_cpu.cpp
863–864

No need to use bitmask here for effiency. Better to be clear with something like:

const int adaptive_sample_step = 4;
(sample % adaptive_samples_step ==  adaptive_samples_step - 1)

Also would prefer to have this code in a separate function.

if (task.integrator_adaptive && adaptive_sample_stop(tile, sample)) {
  tile.sample = end_sample;
  break
}

Same goes for other devices.

962–984

Put in a separate function, like,

if (task.integrator_adaptive) {
  adaptive_sample_scale(tile, kg);
}

Further there seems to be some duplication of logic between devices here. I think kernel_adaptive_post_adjust could be kept private to the kernel, and instead add this in kernel_adaptive_sampling.h and use it for all devices.

ccl_device_inline void kernel_adaptive_scale_sample(KernelGlobals *kg, ccl_global float *buffer, int start_sample, int sample)
{
  if (buffer[kernel_data.film.pass_sample_count] < 0.0f) {
    buffer[kernel_data.film.pass_sample_count] = -buffer[kernel_data.film.pass_sample_count];
    float sample_multiplier = sample / max((float)start_sample + 1.0f,
                                           buffer[kernel_data.film.pass_sample_count]);
    if (sample_multiplier != 1.0f) {
      kernel_adaptive_post_adjust(&kg, buffer, sample_multiplier);
    }
  }
  else {
    kernel_adaptive_post_adjust(&kg, buffer, sample / (sample - 1.0f));
  }
}
intern/cycles/device/device_cuda.cpp
1834–1839

It's a problem that different GPUs will have a different number of samples at which it checks the stopping criterion. That makes both CPU + GPU rendering, or rendering on a farm with CPU and GPUs not work well.

Solving this is tricky though, the bigger the tile size the more fine grained adaptive stopping we can do. Rendering multiple small tiles at once could help to some extent, but there are performance and memory usage reasons to prefer small tiles too.

I think we could commit this as is, but mention it in the release notes at least, and try to find a solution after this is committed.

intern/cycles/kernel/kernel_adaptive_sampling.h
29

In path termination we use sqrt() to approximate a typical view transform, or alternatively interpreted to be more conservative in dark areas.

I guess it's worth trying something simple like that. Doesn't have to be done for the first commit.

58

This is ok for now, but later on we should probably do some refactoring so this doesn't duplicate so much logic from kernel_passes.h, I made a note about that in T72293.

intern/cycles/kernel/kernel_random.h
53–54

Put these in kernel_types.h for use in integrator.c.

Also, knowing that the max number of samples is 4096, sample_is_even could be optimized by leaving out most of the terms, though not sure if that's performance critical.

62–73

Can you add a CMJ feature and #ifdef __PMJ__ for code like this?

Convenient for testing and maybe adaptive compilation in the future.

63–72

Put this in a pmj_sample_1D utility function.

117–130

Put this in a pmj_sample_2D utility function.

intern/cycles/render/buffers.cpp
262

Why is this specific to the combined pass, not sure why that would need to be treated different than other passes?

intern/cycles/render/integrator.cpp
71–72

I don't these should be a use_adaptive_sampling boolean here. This would force the PMJ pattern to be used, which I think is better than assuming that PMJ means adaptive sampling.

236–237

Use constants from kernel_types.h instead here.

It would be good to have a comment here explaining why this uses a fixed number of samples, and perhaps what might be done to improve it in the future.

This revision now requires changes to proceed.Dec 10 2019, 8:07 PM

Impossible (for me) to build this against current master code

If it helps @Juan Gea (juang3d), I was able to compile adaptive sampling support on the fairly latest patches of Cycles here:
https://github.com/boberfly/cycles/commit/8d378f97c2dd6c0bb21ce0f44a05637a02e5565f

I hope this one gets to squeeze into Blender 2.82! Not sure if @Stefan Werner (swerner) is on the case, but the change requests look like something I could tackle here perhaps if he hasn't got the time?

Cheers

The 2.82 ship has sailed, no more new feature commits are supposed to happen at this stage. My plan is to polish this some more, add OptiX support, maybe split it into two diffs (PMJ sampler can be separated) and then get this in properly for 2.83.

I think it's a great plan.

And with a bit of luck @Brecht Van Lommel (brecht) can review the @Lukas Stockner (lukasstockner97) patches for dithering, scrambling distance and light groups, and you can directly make it compatible with them :)

For the time being, if you rebase it let me know, anyways I'll try to apply the fix @Alex Fuller (mistaed) told me, to see if I can re-apply it to our build :)

Thanks!

In the end I've been able to apply this, I had to resolve some conflicts but it's working :)

  • Merge branch 'master' of git.blender.org:blender into cycles_adaptive_sampling
  • Merge remote-tracking branch 'origin/master' into cycles_adaptive_sampling
  • Merge remote-tracking branch 'upstream/master' into cycles_adaptive_sampling
  • Cycles: Fix Adaptive Sampling passes after AOV introduction.

This time diffing against the correct master, hopefully.

I want to get this into 2.83, hopefully I'll have time tomorrow for review.

@Stefan Werner (swerner), the comments from my previous code review have not been addressed yet.

Do you have time for that in the next ~2 weeks? Otherwise I can do it.