adaptive sampling v01
Open, NormalPublic

Description

Hi,
this is a simple adaptive sampling within the TileManager / Session / Tracer.
Session will mark tiles without much progress as "NOPPED", TileManager will forward this flag to RenderTile, Tracer recognizes NOPPED tiles and passes a special flag (-samples) to the CPU/GPU tracer. The tracer recognizes the -samples value and NOPs the trace by adding the average sample.
This way the film_convert() needs no rewrite.
However this distorts the samping weight, as past samples get a larger weight (by NOPPING = duplicating them).
It works on top of trace/branched-trace, and I dont see anything against Metropolis.

A better version (v02) would require static tiling, and the tiles remember the number of samples. film_convert() should work with tile.num_samples instead of tile_manager.num_samples (and thus needed a rewrite). That way tiles could have different number of samples. The TileManager should grow to a "prioritized tile manager" like a OS task manager.

Right now tiling is recomputed everytime during next() -> settiles() -> gen_tiles_xxx() . Is this necessary?
Making tiling static opens up a whole new world. I dont see the necessity for retiling (unless dimensions change interactively).

adaptive sampling v03 would perform all this on a pixel-level. film_convert() would process a "map of samples" (pixelwise) together with the samples.
I dont know if this is maybe too low-level compared to 64x64 tiles.

Rgds,
Patrick

Details

Type
Patch
There are a very large number of changes, so older changes are hidden. Show Older Changes

May I ask on whether you're using a different technique on adaptive sampling compared to what Lukas Stockner is doing?

The reason is that depending on what you're doing, you might be able to borrow some of his adaptive sampling code and speed up your progress,

His evaluation function can directly replace the one in the patch. It is open to any progress/convergence evaluation function.
I just dont feel right about converting to grayscale - it disregards the spectral information. I'm using fake dispersion, so lots of rays of similar brightness but totally different colors. Except from that IMO correlation is a good measure.

Structure-wise I think this approach is more versatile (and faster) than rewriting the tracer code to jitter x/y. Plus this works on top of Metropolis and not side-by-side.

If you mean creating the evaluation map based on brightness + variance, then that would indeed make more sense for a number of cases.

I think a lot of people will want to try a VC2013 build eventually (once the implementation is far enough along to work well for most scenes anyway).

Background: I am working with vs2008. Frankly I never got blender to run, and I also do not intent to unless absolutely necessary. I converted Cycles to DLL (requires MD builds of all libraries) and plug it into our software.
As a common platform (so we have something to talk about) I suggest the Cycles standalone app driven by XML files.

First I'd like to make sure the path is good and sound.
Adaptive sampling v01 vs v02 vs v03, whatever makes the most sense I am happy to help the Cycles project out.

How has Luxrender (or any other renderer) solved this problem? global vs tile-wise vs pixel-wise sample-counts?

I believe the Luxcore rewrite also uses a tile-based approach for adaptive sampling (it does adaptive sampling within the tile and stops it when any visual difference is below a certain threshold), I really don't know the specific details though.

So did I get that right in that you're developing in just Cycles itself without Blender attached to it (because this would be very significant as it shows developer interest from other vendors, I wish you luck in fostering interest in its usage with your software)?

adaptive sampling _within_ the tile? That would mean pixel-wise samples-count variation, and when finished do a scaling to mimic a homogeneous sampling over the entire tile. That is basically what Lukas is doing. In v18 I couldnt find the sampling homogenisation, but I assume this is little work for him.

Proposed v02 would evenly sample the entire tile and remember the number of samples. That enables progressive sampling, coming back and adding more samples without distorting statistics/average, plus on top of metropolis or any other tracing method.
IMO that is fine on 64x64 pixel tiles - that should be small enough to encage difficult areas.

Yes, cycles only. It is node based, very versatile, CUDA-enabled, fast enough for live rendering, etc pp. See T40773 and I have also a nodewrangler script 80% ready to export materials from blender to XML. Our DLL-API is crappy, but might be worth a patch proposal at some point.

Here is a new patch, cleaned up. It checks every 5th sample whether the maximum progress on a tile is below 5% (set parameters in Session::nop_tiles()).
If so, it disables the entire tile.

EDIT: fixed issue when scene is reset (same dimensions etc, just reset):

Hi, I'm quite interested in these adaptive solutions.
How do you manage to avoid visible tile boundaries in a low sampled render?
Using (entire)tile stop condition easily lead to have adiacent tiles with equal threshold but visible difference in terms of noise, as I experienced with the other experimental method developed by likas.

Imo the only way to avoid this and have a real and useful adaptive noise awareness is to have the stop condition working on a per-pixel basis

Anyway. How can Blender users have a try at your patch?

*edit: sorry i meant "Lukas" in prev post.

v01: Even worse, because when NOPPED with visible noise present, it will preserve that noise forever.

The threshold is currently 5% changes checked every 5 samples. I assume it wont be triggered in low sample renders, execept for simple smooth surfaces.
Those 5% changes are not gamma-corrected BTW. Probably makes sense to include the lightness-factor.

But it works surprisingly well for our purposes I must say. We have simple, plain surfaces and some very difficult spots (nothing in between). The plain surfaces level out after a few samples, then are are NOPPED - effectively doubling the sampling speed on the difficult spots.

I'm trying to provide a windows x64 build, but it crashes saying:

C:\Projekt\blender\blender-2.71-RC2-windows64>blender_ASv01.exe
Color management: using fallback mode for management
Read new prefs: C:\Users\CUDA\AppData\Roaming\Blender Foundation\Blender\2.70\config\userpref.blend
Warning! bundled python not found and is expected on this platform. (if you built with CMake: 'install' target may have not been built)
Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'

I believe there's a Blender IRC channel where you can ask about compiling issues. Though they might recommend that you switch over to VC2013 as VC2008 is slowly being phased out and is only still around for legacy purposes (what with WinXP no longer being supported by Microsoft and the fact that the express edition of the new version allows for the creation of 64 bit builds with OpenMP).

Figured it out shortly after my message, just had to build INSTALL. I am adjusting the code so it can be used for blender-hosted cycles as well ( it uses the "Session::tile_buffers" variable to store stuff instead of "buffers" ).

Here is HEAD with the tile-nopper:
http://cycles.patrickrosendahl.de/ASv01_win64_rev3.zip

It is required to enable
Performance - Progressive Refine

I made good experiences with
Performance - Tiles 64x64
That is, at least for the scenes I checked, a good tile size to enable/disable

I am a big fan of "branched path trace". Good convergence just a little slower. Use
blender_avg5frames.exe
for this. It checks the progress after 5 samples.

If you use "path trace", then the progress is much slower and the tile-nopper should wait longer until evaluating the progress:
blender_avg10frames.exe

Only managed to test adaptive_sampling_via_NOP_rev1.diff so far, it gives crashes in nop_tiles. rev2 failed to apply here because of malformed patch at line 349. Also you're mentioning rev3 here, this patch i didn't see.

Main concern for now (which was actually raised already) is the visible difference difference noise-wise on the boundaries. This is gonna to be even worse when rendering animation i'm afraid.

@Sergey Sharybin (sergey): I am manually editing the patches to clean them up. In rev2 nop_tiles() was designed to work in non-background mode only. Rev3 fixes that, working with the Session::tile_buffers variable (not with "buffers").

@lopata (lopataasdf): did you use "branched path tracing"? The simple "path tracing" progresses so slow, it can easily be mistaken for a converged tile.

The session does not stop even if all tiles are nopped. I guess in this case the threshold should be lowered. I'm looking into this and will upload rev3 diff.

Don't edit them or make sure they're applyable after the edits. Would help sharing fully workable rev3 patch for review and so.

Also, am i right you're intending this for the viewport only?

Yes, I used the settings from a text file

@Sergey Sharybin (sergey): correction: rev3 works with Session::buffers _AND_ Session::tile_buffers , depending on whether you run in background or non-background.

@lopata (lopataasdf): it takes about 700 samples to bring the noise to a acceptable level. Until then it stays very (visually) grainy.
The maximum change is way below 1% even after 10 samples. I am tweaking the parameters and check against my other test cases.

maybe we can change seed every N samples for different noise

Nah, 1% change is just too much. this is, speaking in RGB 255-value-range, a change of 2.55 in either R,G, or B (or alltogether).
To stay below visible changes we need about 1/255 ; and then to account for rounding, so we should be around 1/510 \approx .2% .

I ended up using a threshold of 0.01% every 10 samples.
So the tracer needs 40x 10=400 samples to reach 40x 0.01%=0.4% which is 1/255 or 1 RGB-value step.
Or 200 samples to accumulate a .5 RGB-value step.

Mind that it gets harder and harder to perform a .01% change in average, because we have more samples.
Basically that would mean that all previous samples were way off...

The compiled win64 is here (you need that other package for the resource files and stuff):
http://cycles.patrickrosendahl.de/blender_avg10frames_01perc.zip
(give it 5 minutes to upload)

EDIT: 1/510 \approx .2% (not .02%)

These are the settings I like most, it doesnt kill the noise entirely (like if you zoom in), but IMO a good balance btw nopping vs noise:
0.02% every 5 samples
http://cycles.patrickrosendahl.de/blender_avg5frames_02perc.zip

Working on the diff file now.

By the time writing, I realize that there is a memory leak of Session::last_tile_buffers , they need to be freed in ~Session()...

rev3:

The parameters are in Session::nop_tiles() first couple of lines.

Regarding noise levels between stopped tiles:

Could the sample-count be interpolated inside the tile? This means rendering a different sample count on each side of the tile - when one of the neighbouring tiles is stopped, the surrounding tiles don't render more samples on the connecting side, and the sample count increases towards the other non-stopped tiles.

It could also be done the other way - also tiles that are below the stopping condition render some extra samples towards the borders of non-stopped tiles.

In v01: the idea was to stay below a visible progress in order to avoid all those (visible) noise problems.
At the time it stops, 200 samples are needed for a .5 RGB-step ; 400 samples for a 1.0 RGB-step. And all those samples have to be biased from the present average (either all more or all less, otherwise they partially cancel out).

v03 works on a pixel-basis, which would allow for spatial sample-count variation. Lukas has a "fuzzy cloud" approach towards the tile-borders. IMO interpolation sounds reasonable as well. However, there is no implementation yet.

Maybe someone can post a good counter-example to v01, so we have something to discuss in detail.

Looks really interesting, the "nopping" approach is probably cleaner than my "just skipping the rendering" approach.

A few other things I thought of:

  • As far as I see, you copy the whole buffer to the "last buffer". With many passes enabled, just copying the Combined pass might give a memory usage improvement.
  • The error heuristic seems rather empirical: The percepted noise depends on the brightness (the difference between 0.01 and 0.02 is far more noticable than between 0.99 and 1.00). Also, consider the 1/sqrt(N) convergence of MC methods.
  • You mentioned that it would also work with Metropolis. However, that's quite unlikely, since my Metro currently bypasses the tiling system (by just providing one image-wide tile per thread, all sharing the same buffer).
  • Considering color error seems like a good idea, most likely I'll add something in this direction as well.
  • As you already said, the "heavy" changes like the CDF sampling, x/y-jittering etc. are only required for in-tile distribution, not for convergence checking.

I have to agree with you, your approach seems less hacky when compared to mine. Do you think we could combine them or work together somehow? Developing two separate adaptive samplers seems quite wasteful...

I'm always in for cooperations. The goal is to bring Cycles forward. I have seen other products with _very nice_ adaptive sampling results.

v01 works via nopping. But I think v02 would be better, saves _lots_ of time (nopping is still iterating and adding the average).

Your approach would be v03, which I thought could be overdone, but is definately the most accurate way. Like storing another pass, but it is the number of samples per pixel. Since the entire field is passed to film_convert(), the changes should be trivial all of a sudden in that function.
Let me know if you find a bug in those thoughts.

You mean I have to show that above convergence criterium (threshold in difference/progress of averages) is valid for {x1,x2}-Bernoulli distributions with x1,x2 \in [0,255], account for gamma, and we have like 90 or 95% confidence that we wont step in RGB-space, right? (Since any other distribution would converge faster, compare compare Berry-Esseen)

I think worst-case of the gamma function is the linear part with a slope of 12.92. Which brings us a scaling of the convergence criterium (following the 400/200 samples calculation somewhere above) of 1/12.92 for "dark pixels". Basically baking the gamma-response into the threshold.

I always hated stochastic, sigh... I see what I can do with it.

OK, to bring this forward

  • added a PASS_SAMPLE_COUNT that counts the samples per pixel
  • film_convert() uses that pass-data
  • adaptive nopping/resuming of tiles depending on maximum-progress-per-tile
  • max 90% of all tiles can be nopped, after that the threshold is lowered

Will create the diff the next days.

http://cycles.patrickrosendahl.de/ASv02_win64.zip
Give it 5 minutes to upload.

Lukas, you can now vary the sample-count as you like.
Session::nop_tiles() will evaluate the progress as described above.
tile_manager will pass nopped tiles, and you can ignore the flags.

I must say I cannot see noise-gradients in my test files when working with tile nop/resume.

Hi again.

I think there might be a case where the code so far produces messy results unless you have a high sample count set, and that is situations where much of the scene is easier to light and has a large area that's nothing but background.

Here's an example where messy results are seen.

It seems like one of the issues might be that it literally sees nopping occur on the first pass (which leaves a mess in darker areas that takes a lot of passes to clean up). I would advise for one thing to have the nopping only take place after X number of passes so the code has decent information as to whether nopping should take place right away.

As I said, the tile resuming means that everything would eventually clear up, but the way it works seems like it would take more passes for these types of scenes than it could be (which I understand that this is still very WIP code and that it's likely things will improve).

To clarify, if this is going to be what Cycles does by default, then the case I posted above shows that there might be plenty of room for optimizing the strategy as of now (which is understandable since this is a WIP patch).

I did initial tests with two more difficult scenes and I didn't seem to see any problems, and the scene above does clear out with enough samples. Still those, I think improvements in the strategy for these type of cases will be needed if, as I said, this becomes the default mode of sampling.

Thank you for the counter example! I was starting to download stuff from blendswap to check it out...

Yes, I added nopping starting at sample #2, which works very good for our scenes (immediately turns off the background sampling). But this probably should not be the default case.

All this must be made parameters:

  • min number of samples before any nopping _can_ occur
  • check every Nth sample for nop/unnop
  • initial threshold of progress, tiles below that will be nopped
  • Can you suggest a good min-number? ** Then I will change the defaults and upload a new version. Some files were compiled with optimization off anyways...

Yes, resuming tiles will probably clear the scene. It is concentrating on the bad spots first. But then again, the current threshold is very high

I think in this case, you can set the defaults to (my opinion).

Min number before nopping - 5
Samples before nopping again - 2 to 5
Initial threshold - around the way it is now

Also, I would suggest that you obtain a copy of VC2013 so you can build your patch against the latest revisions of Master if you haven't already, support for VC2008 has been dropped and the express version of the 2013 edition allows for 64 bit building and OpenMP support. Otherwise, it might become more difficult to keep the patch up to date with development.

Patching against master. Trying to keep VS2008 as VS2013 gives me eye cancer. Fortunately the changes do not require openmp (yet).

This is a min_samples=5 ; nth_sample=5 ; eps=1/500 build:
http://cycles.patrickrosendahl.de/blender_min5_every5_eps0002.zip

And here is the patch, did not clean it up. It contains a patch to cycles_xml (normals, embedded_files, bump-node, and reading from a string (not a file)) and maybe other stuff I hacked:

FYI, this one purely sorts by progress. So worst tiles are processed first until they catch up, nopping the other tiles. (eps=1)
This one should give very bad noise transitions:
http://cycles.patrickrosendahl.de/blender_min5_every5_sortByProgress.zip

You need that full package for resource files:
http://cycles.patrickrosendahl.de/ASv02_win64.zip

Ah yes, the tweaked code makes things a lot cleaner in less samples (and would overall be faster), no more mess here.

Also, I'm not sure if you're getting the latest revisions (or git for that matter), since the splash still shows it as a pre 2.71 build and the hash number is labeled 'unknown' (new builds should have an enhancement in the Cycles anisotropic node with sampling type options and a better appearance).

You might want to go to the Blender dev. channel in IRC for a better guide on compiling and the usage of VC2013, as they tend to be more than happy to help with such issues (I know the one known as Juicyfruit can help you a bit in terms of the new compiler version).

On the hash unknown thing, Sergey replied on the mailing list to someone who had a similar issue.
http://lists.blender.org/pipermail/bf-committers/2014-June/043802.html

Right, it fails to find my git executable. Will fix that, probably just an environment variable. Thank you!

Just for the heck of it, I implemented pixel-wise nopping. Here is a version that samples pixels by performance (good performance = nopped).
If every pixel is nopped, the entire tile is nopped. Otherwise the tiles is active with pixels marked nopped/unnopped.
It remembers the last performance until sampled again (for 10 samples), then reevaluation.

  • min_samples = 5
  • nth_sample = 10
  • CPU only, PT only

The difficult spots are sampled first now... quite the opposite of the normal renderings. Interesting experience......
http://cycles.patrickrosendahl.de/blender_min5_every10_sortByProgress_pixelwise.zip
(CUDA kernel missing)

This one gives nopped pixels a little penalty for being so lazy - just in case if statistics were playing a trick on us:
http://cycles.patrickrosendahl.de/blender_min5_every10_sortByProgress_pixelwise_penalty.zip
(CUDA kernel missing)

Good stuff, have you ever considered adding Lukas' adaptive map idea for adaptive sampling within the tile on top of the nopping stuff?

So if a tile is not nopped at the interval, it will generate the difference map and use that to guide samples away from the clearer regions to the noisier ones (which would mean a cleaner tile once the initial threshold is reached).

So we'd have your method at the tile level and Lukas' method at the pixel level, seems like it could bring the best of both worlds to me.

Tried your new builds, the pixel-wise nopping gives the best results yet for that simple scene (in 256 samples), but I really do not like the penalty idea because it also gives a big penalty in quality (so you might want to remove it).

My opinion anyway, I personally would try to keep things more focused on how to get this to a state that's ready to be formally committed to Master.

I took some time to review. I does not make sense to NOP single pixels when using GPUs. It would be a cycle waste, since all "threads" (or whatever the term is) run in sync. Nopping a single thread (pixel) makes it just wait for the others.
Lukas' code diverts the current x/y to another (more needing) x/y position. That is the way to go _within_ the tile for GPU.

As on CPU, nopping single pixels makes sense, because we can totally run independently of other tiles/threads. Diverting x/y only introduces another layer of calculation (slow down).

Nopping tiles should be done as implemented, though the evaluation could hop to "tile_manager" as well - though "session" seems more obvious to me.

Eventually Lukas' evaluation (importance map calculation) could hop to "nop_tiles()" which would rather become a "adaptive sampling driver with optional nopping of tiles". That one is re-evaluated every 10 frames. Infrastructure with current/last PASS_COMBINED (and all other passes) is present.

So the infrastructure is provided. It kinda depends on the community what to do with it. Will provide the v03_rev1 patch soon:

  • PASS_SAMPLE_COUNT : int num_samples ; float pixel_nopped (nopped=1, active=0) ; float last_evaluation_value (= progress in last 10 frames)

A word to the *_penalty.exe : when a pixel is nopped it receives a penalty to its last evaluation (making the evaluation worse), so the pixel is considered for sampling much earlier as without penalty. This activates good pixels for sampling that would otherwise remain dormant for a longer time. I'd expect a better quality, but didnt check yet.

Anyways, thank yall for the feedback and discussion.

Regarding the distribution: I don't think you can assume a certain distribution for pixel values. Consider, for example, a plane and two pointlights: Every sample will bt lit by one lamp, so there are just two states that a sample can have, instead of a continuos distribution. My first approach was to rely on the Central Limit Theorem and assume a normal distribution as the scene gets more complex, but this model also has a problem: A quite big number of samples will be just black, which is highly unlikely with a normal distribution.
This is why I rely on the O(1/sqrt(N)) convergence of MC methods to estimate the progress: By having the difference between (O(1/sqrt(N)) and O(1/sqrt(N/2)) (the even-samples pass), you can estimate the constant hidden in the O() and therefore know the difficulty of the pixel. Another approach would be to sum up the (L - Lavg)^2 to get a variance estimate, this is what I did before. I might even switch back to this from the current LuxRender approach using even samples, since I think it might be more robust. One downside, however, is that Lavg isn't known in advance, so the whole estimation is consistent, but not unbiased.
Another thing I just noticed: The E(x^2) - E(x)^2 form of variance shoud work better by just recording L^2 in a separate pass. Then, in convergence checking, the variance of each color channel could be measured and averaged (weighted by the channel luminance).

[My idea to the "worst case distribution" was that the 0-1-distribution gives me extreme values for each sample, maximizing the variance.
Or the other way around, if any sample lies in between (say, a continuous rectangular distribution), it converges faster to the expected value.
So when I can prove something for this worst-case distribution (like the above series-of-averages converging theory), it holds true for "friendlier" distributions.]

I see your very valid point of N vs N/2. My thoughts to that:

  • those 2 series are completely dependant - why not divide the series into 1..N/3 and N/3+1...N (ratio 1:2 still given)
  • the difference between Lavg(N) and Lavg(N/2) gives you a good measure of how good the series are. CLT converges fastest around true Lavg, getting worse along the axes. So if Lavg(N) and Lavg(N/2) are "far" away from each other, at least one of them is far away from true Lavg, and we have a problem (ie normal distribution and all related rules are not valid)
  • (L-Lavg)^2 seems much more robust than any method that depends on moments. moments potentiate the above problem of "farness from true Lavg"
  • numerically it is better to sum up separately, then subtract, to avoid cancellation errors

I come from fluid dynamics, PDE, so my approach to a gradient is N vs N+10 - I'd never calculate a gradient from x and x/2 ;)
Using the "gradient" of averages method, we can estimate the (maximum) change of the next 100 samples.
By re-evaluating every n steps, we can correct that estimate. Converging points will nop along way - unnopping when we reach a certain threshold.
This way "bad" pixels get the attention they need. The flat convergence curve is recognized, while disregarding any statistical properties like variance.
It is a unstatistical approach, dynamically evaluating the statistical progress/convergence and driving correction steps (nop/unnop).

Well, your thoughts on Distributions seem reasonable, I agree with you.
My N and N/2 approach was based on the fact that for MC, we actually know the gradient up to a constant: const/sqrt(N), so const can be estimated from the two values. However, it doesn't really matter anymore: I just tried the E[x^2] - E[x]^2 approach, divided by sqrt(N) to get remaining error (the actual formula is a bit more involved due to the perceptual weighting). The advantages are:

  • (In all my tests) remarkably accurate results
  • Simple kernel code (just sum up L*L)
  • Contrary to the (L - Lavg)^2 approach, it doesn't need Lavg until actually evaluating the error
  • Based on statistical quantities (this probably isn't a real advantage, but I generally like approaches based on a solid theory better)

However, I have to admit that I don't have any academic background yet since I still go to school :D

How about this

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

"Incremental algorithm" ? It could be evaluated along with adding the new value for each channel.

This version precomputes the scaling (1/sample_count) to speed up subsequent calculations.

  • min_samples=10
  • nth_sample=10
  • eps=0.002
  • maximum of 30% tiles nopped
  • tile-wise nopping, as pixel-wise does not make sense for CUDA

Includes proper CUDA kernel.

http://cycles.patrickrosendahl.de/ASv02_win64_precompute.zip

And the diff (contains some other hacks):

@Lukas Stockner (lukasstockner97) I think your model is correct, there is an arbitrary color-distribution around the pixel (in space) c(x). And then you have a distribution-density-function to randomly scan that space phi(x). The distribution might be evenly distributed (very diffuse material), or a very sharp bump (eg reflective smooth material, incoming angle=outgoing angle).
Then you need to integrate c(x)*phi(x). You do that by sampling c(xi), where the generation of xi is determined by its density phi. The mean value is the desired color for that pixel. At least that is how I would model this.

Been testing the newest build, I'm starting to get the feeling like the nopping is not being aggressive enough now. The settings seem pretty loose and more complex scenes will need to go a long time before any type of nopping occurs.

Perhaps you can ask Lukas to take your patch and build it against the latest post-2.71 revisions using VC2013, he can also then add a UI with those options exposed so the community can figure out the best settings.

Oh, when I mean more complex scenes, I mean scenes like interiors and those with complex materials. This would also mainly concern the long wait to see the nopping of tiles that are over actual geometry and not just the background.

The settings are pretty conservative. This will sample the entire scene until even dark noise is barely visible. Then it will start nopping stuff.
So unless you have very "simple" areas that fill an entire tile, there will be no nopping.

It makes sense from the angle, that you need to continue sampling the image anyways in order to get rid of the dark noise.
Or the other way around, it does not bring everything to the same "noisiness" by nopping not-so-noisy tiles until they are all about even.

Here is the diff against current master:

Hi, I tested this and an older patch but cant see any difference between patched and not patched results.
Do anybody have a test file to show the advantage of the patch?

Thanks, mib

@Wolfgang Faehnle (mib2berlin) It is designed to turn off tiles without affecting the resulting image. By turning off tiles, it saves calculation time. I did some benchmarks, but let me first tweak stuff.

I moved the nop-options to SessionParameters. Now how do I add these options to blender UI?

EDIT: If you want to see the most agressive nopping, choose

  • min_samples=5
  • nth_sample=5
  • eps=1
  • max_NOP=.5

Nopping will start at sample 11, nopping about 50% of the scene. When the pixels start to converge, it will slowly continue to sample the dormant tiles, keeping everyone to about the same noiselevel.
Noiselevel threshold is halved when more than 50% of the tiles were about to be nopped. This could be tweaked as well, like eps*=0.75 or so. That should give smoother tile borders in terms on noise.

Note that the sample count is totally wrong as about 50% of the tiles are nopped. It should be halved approximately. So when you set 512 samples, you wont get 512 samples alltogether (roughly 512 samples only at the difficult spots).

Once the noiselevel is at .0002 I'd consider that "very low noise" - everywhere. So you see it is a long way from 1.0... and allot of samples that the tile_manager counts but dont actually reach the tiles.

To add a UI Option, add a property in intern/cycles/blender/addon/properties.py, add an UI entry in intern/cycles/blender/addon/ui.py and sync it in intern/cycles/blender/blender_sync.cpp

Thank you, I wish I asked earlier. Here it is:

And here:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev4.zip
Give it 5 mins to upload.

Here is a blend with agressive NOP settings. Not sure how much sense that makes. It will sample at least 50% of the scene, working only on the bad parts (at that time), lowering the threshold if necessary:
http://cycles.patrickrosendahl.de/blends/Cycles_physTonemap__GPU.blend
Set it to max_nop=0.9 for bad results :)

Senseful settings would be eps=0.01 to start nopping when there is barely visible noise. I dont see the point to nop above 0.01 - you need to sample those spots anyways b/c they dont look good.
Use nopping to disable low-noise tiles or backgrounds. Then progress on high-noise regions until they are low-noise as well. Then it will automatically resume to bring down noise levels on all tiles.

mib2berlin; as PRosendahl said, I don't think he's actually using an adaptive map to drive the random number values themselves so as to move more samples to noisier areas within the tile. This patch right now is basically a system of stopping/resuming tiles so as to speed up the rendering by way of decreasing the time between passes (which is done by not processing cleaner areas).

Trying the new build, the UI seems a bit rough, but it does its job. I am able to obtain nopping for more complex and interior scenes now.

Thank you both for clarification, blend and the new diff.

Cheers, mib.

I decided to put the NOP parameters below the existing "Performance" parameters. They are not aligned perfectly as they are just 2 independent columns...

The current code could be changed in the following way:

  • evaluate progress on _all_ pixels in the tile (currently stops evaluating as soon as eps is violated to save time)
  • generate a dx/dy re-routing map (could be an {ushort/ushort}-pair in the PASS_SAMPLE_COUNT, default value {0/0}):
  • based on performance route best to worst pixels
  • changes to the kernel would be minimal (x += dx; y += dy)

Not sure how much sense that makes. There is allot of processing power connected to that, and I seriously doubt that time can be made up for.
But the infrastructure is there...

When you talk about the processing power needed to redirect samples, do you mean the idea of redirecting samples on an image-wide basis or on a per-tile level (like Lukas' patch does)?

It would seem to make sense for me that the best chance to further cut time would be by doing it in a per-tile way (because you would only need to redirect samples within that space), but I'm no expert in how the code works so the overhead might be higher than I imagine.

I would not break the tiling. IMO it is the best way of dividing work into tasks.

Redirecting could be done within the tile. But current image quality does not force me to add this feature.

A)
There is an important shortcut: if one pixel of the tile exceeds the threshold, no further pixel is checked. For bad tiles, this means that after checking one or two pixels we already bail out of the loop, knowing the entire tile must _not_ be nopped. At 64x64 pixels per tile we easily have a speedup of 100-1000 (within the evaluation loop).

B)
Once the _entire_ tile is evaluated, we need to look for good and bad pixels; sort them or divide them into good/bad; then create a re-routing map.

With A and B running in addition to the current code, frankly I dont see the performance. Without being able to prove it, I claim that it is faster to "just sample over the entire tile" than to do all that above.

One more thing to comparing nopped / no-nopped images: mind that if you set max_nop to 50%, only 50% of the samples will reach the pixels.
So you need to either compare "512 no-nopped samples" to "1024 nopped samples at 50%", _OR_ abort the rendering after the same time limit.

Hey, I know you might be busy working on the Node Wrangler addon, but I've been trying to use this on a more serious scene and I have some observations to point out.

It turns out in some scenes, I can't be real aggressive here because I might run into an issue where tiles in certain situations have areas that remain noisy despite convergence going around them. This would happen in situations where only a small part of a tile lies in a region where it takes a lot more samples to converge and the Nsamples number otherwise simply is too small to ensure a sample gets in that area so the tile would not be nopped. The same thing with areas needing several passes before another sample is taken there can also lead to tiles remaining noisy under the nopped list while easier tiles get samples.

I have a few ideas on how more aggressive nopping can be fully doable and only have it increase rendering performance and quality as a result.

1). If a tile gets a sample that makes its max error a certain amount above the current threshold, re-activate the tiles surrounding it for the next Cycle.
2). If a tile gets a sample that makes its max error a moderate to significant amount above the threshold, don't nop it at all for a few Cycles under all any circumstance.
3), If a tile's max error is significantly above that of the majority of other tiles, execute a true adaptive sampling routine for that tile for the next few cycles (as in sample re-routing not unlike Lukas' patch).
4). Once enough tiles are nopped compared to the total number, create a random selection of tiles to get double the samples based on the ratio of nopped/un-nopped.

Just some ideas, I hope you don't mind.

Eh, don't think I'm yelling, I have no idea what happened to make some of my text so big :/

I dont mind at all :-)

I have checked the nopping criterium and the values are going "wild" compared to theory. That is because it is based on the current performance, suffering from the statistical data spread.
But a little smoothing makes things much better. This ensures the values wont go into the cellar immediately causing a nop.

If anyone is interested I am attaching the Matlab/Octave script I used to verify this.
In the beginning you can choose between uniform, normal, and Bernoulli distribution with the variable "r". Further down there is "n_th" which is set to 10 by default.
The script calculates mean and variance via the "online algorithm" which is found here:
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
(It is suitable for implementation in cycles.)

Also attached is a sample output (for the Bernoulli worst case). In the top left plot you can see the absolute error, the MC theoretical error. Furthermore the MC theorerical error first derivative and the n_th-difference of the mean value (already smoothed).
Bottom left you can see what theory and what n_th-difference predict on the number of samples needed for a +/-0.01 value step in mean.


@Adam Friesen (ace_dragon): The smoothing will take care of most "accidental" noppings.
But it is really not designed to be aggressive at all. In the end you need to come back and unnop those regions anyways, so why not render them straight from the start? I'd really bring them down to 0.002 or at least 0.01 before attempting to nop anything.
You can force the processing of the worst tile by setting the "max nop" to something very high like .99. This allows noppping 99% of all tiles. It will be evenly sampled though.

EDIT: I forgot to comment to the "adjacent tiles": currently the tiles have no reference to a grid. Makes sense because gridding might not be uniform in the future. They are just kept in a list.
A tile's (bad) performance could bleed into the adjacent tiles, I think that would be the best model to do this. Like 50%, but only if it makes the performance of the adjacent tile worse.

Do you have a post 2.71 build then with the smoothing that I and anyone else tracking this could try?

I ask because as you say, the smoothing will probably take care of those cases and thus prevent noise retention, thus making the nopping system far more useful and accurate.

Here you go:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev5.zip
(it is just the EXE)

Patch in session.cpp / nop_tiles() :
tile->priority = max( maxVariation, (0.5f*maxVariation + 0.5f*tile->priority) );

I tried it, and the nopping seems to be broken in this build.

I copied all of the dll's and such from the rev. 4 build file to rev. 5 and I got Blender working. However, when I did the rendering, I found out that there's no nopping going on at all.

Instead, even on very simple scenes, the threshold just keeps going down on every update until it runs out of precision and breaks the sampling. Something about the smoothing appears to be keeping any sort of nopping from happening now.

Right... priority should be initialized with 0:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev6.zip

Patch tile.h / Tile() constructor:
: index(index_), x(x_), y(y_), w(w_), h(h_), device(device_), rendering(false), flags(flags_), priority(0) {}

and Session::nop_tiles() remove the following lines:

  • // priority 0 means terminal death
  • if ( tile->priority==0 ) {
  • noppedtiles++;
  • tile->flags |= Tile::TILE_FLAGS_NOP;
  • continue;
  • }

EDIT: fixed the upload

I don't think the file uploaded successfully, I get a file not found error even after waiting a little bit to see if it had to finish uploading.

Okay, got the file working, it's doing what it should be doing now.

First impressions look to be pretty darn good right now. Once you also note to disable the russian roulette by way of making Min bounces = Max bounces, I am noticing that overall noise measurement seem to now be a lot more accurate for the scene I'm working on and thus is leading to no premature nopping for any of the tiles.

This means the amount of new tiles nopped per-evaluation is a bit lower, but when you especially combine it with a tile-size to make for a semi-coarse noise map, it seems to me like the quality (along with just how much extra performance you can get) may be to the point where about ready to be formally committed to master barring some possible code cleaning and UI improvements (depends on what comes up during code review).

Still i can't see benefits in my daily usage ...probably beacause it's not clear to me what this patch is actually doing!

As exposed in the other adaptive patch message list, I thought of a workflow like this:

  • Each pixel gets evaluated as usual, but now every x-samples* a variable gets stored (maybe in a dedicated pass?). Let's call it Y
  • Y is the absolute difference from the previous pixel computed and the new one. (Being that the sampling iteration brings every pixel to be averaged at every iteration, Y value will be decreasing more and more)
  • Here we could set the threshold: if Y < threshold then stop sampling this pixel. Or, NOP it!

(* this could be every single pass or user-defined number of passes. Not sure what would be better.)

Not sure if i was clear. (And not sure if this is a totally absurd concept). This way we could get the variation per pixel and not by neighbors, contrast, entropy, etc...
At the end, if every pixel has been stopped at 0.05 threshold we should have uniform noise level.
What is wrong?

From what I understand, the description is correct, except for

  • the difference of the averaged sample (that is the resulting linear pixel color, not the accumulated value) is compared every x-samples
  • the nopping is tile-wise
  • once all tiles are under the threshold, the threshold is lowered and some tiles are reactivated

Isscpp; the proper way to make comparisons with this patch is to render based on time and not samples.

When the nopping starts, the subsequent samples that follow get done a bit faster (how much depends on how much of the image you allow to be nopped at once). That means more samples in certain areas which mean more convergence.

Also, I think the only way to really do pixel-wise nopping (even if limited to the CPU if needed) is if the evaluation map gets a dilation of the pixels followed by a smoothing, as otherwise you might get specks of noise retention in areas more difficult to sample.

Also, I'm wanting to know something about the evaluation map.

Say you were able to see the evaluation map, would it be looking more like a grid of tiles at different intensities or is it like a more detailed pixel map whereas the highest color value within the tile is used as the total error.

I ask because if the evaluation map was calculated using the latter method (with a little edge dilation and smoothing on top), you could perhaps all but eliminate any remaining noise retention caused by tiny regions on the edge that take much longer to converge because the error map in neighboring tiles will 'bleed' into the tile containing the region, raising its max estimated error and keeping from being nopped.

When I talk about these corner cases, I mean in some way they literally look like a corner case on the image (like if most of the tile is over a surface that converges quickly with a corner in an area dominated by indirect or hard-to-sample lighting and thus taking longer to converge, but then is more difficult to constantly get the samples needed to keep its error up).

Personally, I think the smoothing already made good headway into making the noise map more accurate, but allowing this type of value bleeding will ensure it's about all the way there.

In v02 the evaluation takes place on a pixel-basis, but only the tile stores the maximum "value" of the contained pixels.

I understand your reasoning for "bleeding", but mind that this will just cause noise gradients on larger scales. Suppose the tile-borders are looking good, now look at a stretch of tiles...

v03 works completely on a pixel-basis but there is no implementation.

And then there is the "proper" evaluation by going through the variance, see the MATLAB file above. I believe Lukas implemented that with good results.

Here is the current patch and EXE:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev7.zip

Thank you for the opportunity for us to play with these frequent updates, I don't know if I have anything substantial to add until we start getting the V03 builds (which may or may not be needed before going into code review at the rate things are being tweaked and enhanced).

All I can say is that I'm getting the most promising results since this patch first came up, so it's made some serious progress.

I would request stop using the patch review system as a forum.

For sure interaction with users is crucial. but it's to be handled outside of the tracker. BA forum is much more appropriate place for such a communication.

As for the patch itself here are the concerns from quick check of rev7 patch:

  • It exposes loads of rather obscure for users settings to the interface, which are not clear. I.e. what's the units for NOP Threshold? Why NOP Max Nopped tooltip tells it's percent but slider is set to 0.5? What are the best values for average animation?
  • Viewport rendering is still crashes like a mad, i have no idea how one could have used this patch for tests.
  • Final rendering seems to be unaffected by this? I don't see any difference in render time comparing blender with and without the patch.
  • The approach is really questionable, it's gonna to be some visible artifacts on the animations.

I didn't check code deep enough because i don't really see reason on this for until the stuff above is solved. Concerns from the quick glance:

  • You're mixing other patches in there. I.e. it seems patch T40773 is also in there.
  • You don't follow Cycles code style conventions, mixing your own style into files which used to be really consistent.

Here is a build that disables tiles that show little progress, working on the noisy tiles only.
When the noisy tiles become settled, it continues to sample all tiles depending on the noise level.

This is for cycles rendering, "progressive" only. The settings for "NOPping" tiles are found under "Performance".

nop min samples: run at least some samples on the entire image before attempting to nop anything
nop nth sample: check every x-samples whether we should NOP some tiles and work on "bad tiles" only for a while
nop threshold: should be less than 0.01 which is still noisy in dark regions, I suggest 0.002
nop max nop: maximum amount of tiles NOPped (0.9 = 90% tiles can be nopped)

Suggested values are:
min samples: 10
nth sample: 20
threshold: 0.002
max nop: 0.9

You need all supportfiles from here:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev4.zip
and the latest EXE is here:
http://cycles.patrickrosendahl.de/blender_NOP_v02_rev8.zip

The patch is here:
http://cycles.patrickrosendahl.de/adaptive_sampling_v02_rev8.diff
Be aware that there is some other stuff for cycles in this patch like enhancing the XML.
To build for blender, define BLENDER_APP , otherwise you get crashes.


Here is a package with .blend that shows how to save rendering time:
http://cycles.patrickrosendahl.de/blends/perf2.zip
Normal sampling: 2048 samples : 11:17 min (reference image)
NOP settings: min10 nth20 threshold0.01 max0.9 : 5:35 min (with visual diff to reference image)
NOP settings: min10 nth20 threshold0.002 max0.9 : 8:16 min (w/o visual diff to reference image)

EDIT: new benchmark: 2048 samples, PT
2.71RC2, not progressive | NOPPING 10/20/.002/.9 | NOPPING 10/20/.004/.9
tilesize 32: 11:04 | 8:11 | 6:02
tilesize 64: 4:00 | 4:01 | 3:15
tilesize 128: 2:52 | 3:16 | 2:54
tilesize 256: 2:29 | 3:05 | 2:54
image quality degrades slightly, but visible in closeups
too much overhead, even with simple nopping...


units of the threshold is "linear rgb" (0...1). The fraction of 1/512 \approx 0.002 is half-a-256 step, which I consider "very low noise level, barely visible".


The current patch comes in 3 flavors: #define
CYCLES_DLL optimized for live rendering performance (eg camera movement must be applied within about 1s)
CYCLES_EXE optimized for performance (might pause screen update for 20 frames)
BLENDER_APP not optimized, using user-supplied values, no adaptivity

Just for the heck of it, I implemented the theoretical approach, estimating the remaining error via variance.
From above MATLAB file, I dereive that we need at least 100 samples to determine variance, better 500 samples, for hard pixels of variance 0.01.

PT (and BPT) is enhanced by calculating the variance (via somewhere above mentioned online-algorithm):
[RGBA] [int numSamples (N)] [float 1/N] [float 1/(N-1)] [float errEstimate] [RGBA "M2" with VAR=M2/(N-1)]

Following the error/confidence estimate P(|X/n-p| < \eps ) > 1-VAR/(n^2 * \eps^2), the error estimate is \eps^2 = 1/(1-P)*VAR/N^2
No need to calculate the sqare-root, we will just square the threshold for \eps as well.
Just leave 1/(1-P) = 1, so \eps becomes a pixel-wise measure for error/confidence.

There is a loop in session (nop_tiles()) that checks performance and lowers the threshold for \eps.
Somehow I must not NOP tiles - there must be a race condition somewhere although I am explicitly waiting for the task to finish. Seems NOPping tiles is based on old performance data, although I should have latest buffers at hand (?!?).
Well, so no NOPPing of tiles this time, but PT will skip calculating if \eps is under threshold. So tiles are aquired and released, x/y are scanned, even for "good" tiles.

Performance on CUDA is surprisingly well, but I can hear the fan going on/off. So apparently we are wasting lots of cycles here as pixel-wise nopping leaves some/most threads idle.

With NOPping working (read above), this should perform well for CPU.

Suggested settings:
min samples = 100 (at least, better ...500)
nth sampe = 50 / 100
threshold doesnt really matter, quickly adjusts by itself

max nop = how much you expect to be nopped, and then a bit more
For example: with 1/3 static black background, set max_nop at least 50%. Otherwise it will try to activate the background. (Absolute lowest boundary: allow 30% nopping)

Contains CUDA kernel:
http://cycles.patrickrosendahl.de/blender_NOP_v03_rev1.zip

This is similar to the v03 attempt way above (https://developer.blender.org/T40774#56), and should perform similar.
However, this v03 attempt uses more memory, and needs way more samples to get working properly.
I guess blurring of the performance "image" and so on could help on low-sample situations, but then again statistics need a certain data basis.
All this, just to get theory working. Not sure if it is worth it.

Patch is here, #define BLENDER_APP for a blender build, CYCLES_EXE for a cycles exe build, CYCLES_DLL for a cycles DLL build:


And yes, there are some other hacks in it.


EDIT: BPT (or any "slow" settings) doesnt work well with it. Takes forever to wait for 100 samples to finish, no chance of applying theory before that. Like at 10 or 20 samples. Variance estimates are _far_ off then.
Big plus for the old approach of "observed performance".


EDIT2: cannot really say why |X/n-p| works much better than |X-\mu| (at least for me) introducing another 1/n into the error term. So I am not pressing the abs error below threshold, but rather the "goodness" of the samples, ie how well probability is approximated.

josh (joshr) added a subscriber: josh (joshr).EditedMay 22 2015, 12:58 PM

Hi Patrick

I tried to get this to work against the current master and had problems with the key variables for "nopping" not being found and various "RNA" errors and was also unable to get the options under the performance panel to appear in blender.

When I was reading the code I have to admit it could really use a bit more commenting and most importantly variable names which make sense. Most people don't write in assembler these days so NOP for instance doesn't mean much, eps is another example there are many.