Page MenuHome

Metropolis Sampling
Closed, InvalidPublicPATCH

Authored By
"Like" token, awarded by G.Krause."Love" token, awarded by Shimoon."Love" token, awarded by mirceakitsune."Love" token, awarded by erel96."Love" token, awarded by ofuscado."Love" token, awarded by bao2."Like" token, awarded by pafurijaz."Orange Medal" token, awarded by tychota."Love" token, awarded by tuqueque."Love" token, awarded by vitos1k."Love" token, awarded by monio."Love" token, awarded by sterlingroth."Like" token, awarded by sparazza."Like" token, awarded by PGTART."Like" token, awarded by leon_cheung."Love" token, awarded by lockal."Mountain of Wealth" token, awarded by ace_dragon."Love" token, awarded by akishin."Mountain of Wealth" token, awarded by 00Ghz."Love" token, awarded by maverick."Love" token, awarded by lordodin."Like" token, awarded by lopataasdf."Love" token, awarded by gandalf3.


I recently decided to try adding a Metropolis sampler (no BPT, just the sampler) to Cycles to get some experience with the Cycles code layout, but it turned out working so well that I decided to post it here. Due to some problems, it's not releasable yet, but these problems (more below) should be not too hard to fix.
The code is based on the SmallLuxGPU metropolis sampler, after I first tried the PBRT code, but that one didn't seem to work nearly as well.
Basically, it works by bypassing the RNG functions: When they are called and metropolis sampling is selected, they just return a value from a sample array that is stored in the RNG pointer. This allows the sampler to make only minimal changes to the kernel, it even uses the standard kernel_path_integrate. The sampler itself is in the CPUDevice thread.
My current test scene is a simple pool with a modifier-displaced water surface, an absorption volume in the water (works great, by the way), a glass pane on one sine of the water and a Sun-HDR-combination lighting the scene.

The upper image is the 2.69 release, while the lower one is with the metropolis patch. Both were rendered in equal time, but the patched version is a debug build.
From the Metropolis image, the biggest problem is obvious: Tiles. The current system requires one sampler per tile, so the seams are easily visible. Also, if one tile has large bright surfaces, the noise outside of those is worse than in other tiles. The solution for this would be one sampler per thread, all working on the whole image (which would require atomics for the buffer writes).
Another problem are passes, since they're currently written directly from the kernel. However, in Metropolis sampling, they need to be weighted, but this weight is only available after the kernel is done with the ray. A solution for this would be to return the values to be written to the buffer from the kernel, so that the Device is responsible for storing them. The depth, normal, ObjectID and Alpha passes could be done in a single-pass regular pathtrace.
Also, there currently is a bug that causes standard pathtracing to crash, I still have to find the source of this one.
However, this seems like a promising feature that might be worth the work fixing the problems above.

PS: A one-hour-render of the pool looks like this:

The patch is here:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Lapineige added a comment.EditedJul 15 2014, 9:19 AM

@Callum Cooke (jemonn): ok the new build works ! Thanks !
But even this time rendering with MLT and render border enabeled do not crash Blender, but still fails.

Lukas, testing the VC2013 build I have with your patch, there's a chance it may just be MingW being unstable compared to the official platforms (MingW being the only platform Holy Enigma will build for).

I couldn't get the crashes I got early on with the other build in this case, so it may be MingW's fault and not the fault of your code.

If I do indeed find it a false alarm, then I'm sorry for that.

EDIT: Okay, even the use of 64 threads doesn't crash adaptive sampling with the VC2013 build, so it really is just MingW doing its thing as the least stable platform you can possibly build with. Sorry for that.

Another question.

Do you by any chance know what the unit is for the adaptive map update value? I ask because I couldn't really obtain a difference say, between a value of 5 and one of 50, but then I found that much larger values seem to be necessary if I wanted to see a difference when using a lower error tolerance along with the adaptive map.

Perhaps if one could make sure that the value was easy for the user to understand it would be different, say, update the adaptive map every N number of passes instead if it's not being done like that already.

MLT 19 (holyenigma MingW build) really good fast CPU renders (once I messed with it a little, and followed Lucas' suggestions) on a "physically" simple scene, with very complex "real" spectral/caustic lighting, using MLT 19 holyenigma MingW
A useable "draft" in Around 10 minutes 1920x1080 50 passes, adaptive sampling at 2.5.

Before using this MLT 19 I could NOT GET a good reasonably, noise free render any other way on this scene, I tried many combinations on both PT and BPT, at least 200 times. Tried MLT 18 and it crashed a lot on other scenes, maybe I'll try it on this scene just to see ....

Image and .blend file here
BA Thread: Cycles MLT patch
i7 2600 3.4ghz, Windows 7 64-bit

rendered scene above at 4 times the samples, was at 50 but once it gets to about 70 passes it starts showing "black fireflies" - even tried it on "Suzanne" with the same spectral node AND lots of ambient light - same problem?

also tried @Callum Cooke (jemonn) build of 19 w GPU (GTX580 3 GB), and the GPU mode is about 30% slower on this file, than using CPU = i7 2600 3.4ghz, Windows 7 64-bit 12 GB

Just an FYI: out of the box patch from master to metropolis_19 yields "patch does not apply" and "trailing white space" errors and when compiling:

blender\intern\cycles\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found
blender\intern\cycles\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\device_cuda.cpp(716): error C2143: syntax error : missing ')' before '.'
blender\intern\cycles\device\device_cuda.cpp(716): error C2228: left of '.w' must have class/struct/union type is 'int'
blender\intern\cycles\device\device_cuda.cpp(716): error C2059: syntax error : ')'

My machine (using the term lightly):
Windows 7
VS 2013 Win32 (builds master just fine)
portablegit (Github) with tortoisesvn (idk if it makes a difference)

Hope this helps

I've just tried the test file @Craig Richaredson (craigar) linked to above and something is really wrong with importance equalisation (but we all knew that anyway).

However, I have slightly different results with my CPU and GPU?

i7 4500U - 400 samples, importance equalisation off, 4 chains. (04:53:63)

Geforce GT 750M - 400 samples, importance equalisation off, 16000 chains. (08:53:28)

Also GPU rendering is really slow for me, might just be my build as @Craig Richaredson (craigar) said it was slower for GPU, but there usually isn't this much difference in the speeds.

With the same exact test file I posted here, it takes 2 1/2 times as long to render with MLT 18 -but it is much higher quality - and no black fireflies ? Also MLT 18 shows way more of the Translucent BSDF that I slightly mixed into the prism, to give it some whitish glow, so I brought down the " fac " from 0.05 to 0.03 in the mix shader - and left it there in all of these renders so you can see the difference .

BA - "Prism test" MLT build 18 slower - but better than Build 19

New patch version 20 is done, the two big changes are:

  • Progressive adaptive mode: If you enable both adaptive stopping and progressive rendering, it will render every tile for one mapping interval in a prepass. Afterwards, the tiles are sorted by error, and the tiles with highest errors are rendered and then reinserted into the list. Once the worst tile is below the error threshold, it stops. In the status line, the currently worst error is displayed. This works both on CPU and GPU, in theory it should also work with Metro, but Metro error estimation is still quite unstable (Variance estimation in weighted means is really annoying). The performance seems fine for me: Rendering without Progressive took 1:31 min, with Progressive it was 1:33 min (BMW scene). The only thing missing are the orange markers around the current tiles, but I still have no clue how the currently active tiles are passed to the Render API, so now, when I just enable the highlighting, every tile gets highlighted :/
  • Metro is now working with tiles again! The basic problem, different mean intensities and therefore visible tile borders has become slower, but is still there. So, how to solve it? The solusion I chose is quite simple: Run a warmup prepass to calculate it in advance instead of calculating it on the fly and therefore use wrong values in the beginning. This results in a new parameter: The number of samples to run for warmup. For quite simple scenes (BMW, Pavillon etc.), the default of 10 is just fine. If you get visible borders (in the first few seconds, these are expected since the chains might not have found the high-contribution paths yet, so give them a few seconds), increase the warmup samples: For the Villa scene (from PBRT), I ended up with 3000 samples at 32x32 tiles, but I modified this scene to be a worst-case example: High occlusion, lamps in glass, three glass panes in the windows (!)). Also, the bigger the tile is, the fewer warmup samples you need. Note that this also eliminates the black fireflies and allows Multi-GPU (one tile per GPU).

Smaller changes are:

  • Instead of a separate filter dimension, the Metro sampler now uses the fractional part of the pixel coordinate. IMO it gives cleaner filtering.
  • I added the algorithm from the paper "Automatic Parameter Control for Metropolis Light Transport", which automatically chooses a large step probability. This is especially helpful now since every tile might have a different optimum value. In the (distant) future, this might also be a good criterion for automatically choosing between Metro and PT: A value > 0.9 suggests that there is no real advantage in Metro (for example, this happens in the BMW scene for every tile except for the glossy reflection in the lower left corner and the headlights).
  • The unit for image mutation range was changed from "percentage of the image" to pixels. This means that the old default value of 0.1 will give extremely noisy results, I recommend something around 20-30. Remember to always change this from the old default!
  • You can't really compare render times to older patches since I changes the "work per sample"
  • The MSVC missing underscore was fixed.
  • The util_hash.h problem was fixed, it occurred when Cycles tried to build the CUDA kernel during runtime. I found this one by accident when my compile-time kernel was still at sm_20 from fixing the texture problem.
  • Speaking of it: The texture limit problem is also fixed.
  • Viewport rendering is broken (again). I'll try to fix this asap.
  • Importance equalisation is still broken.

So, now for the bad news: Since I have a lot to do in the next time, I won't be able to work on new features until around mid september. If there is some bug, of course I'll try to fix it (like the viewport bug). I'm really sorry for this, but I just won't have the time to work much on the patch. So, it would be great if you could extensively test the new version to find any remaining bugs, and also if any developers could probably just look at the code to find remaining problems (apart from bad coding style and whitespace errors :D).

Attached are some images I rendered while testing.

(Note the visible tiles, I didn't use enough warmup)

So, here's the patch, created against 508e0a:

It won't build for me:

Error	1	error C2664: 'long _InterlockedCompareExchange(volatile long *,long,long)' : cannot convert argument 1 from 'volatile __int64 *' to 'volatile long *'	c:\dev\blender\blender\intern\cycles\kernel\kernel_compat_cpu.h	274	1	cycles_kernel
Error	2	error C2065: '__INT_MAX__' : undeclared identifier	c:\dev\blender\blender\intern\cycles\kernel\kernel_random.h	108	1	cycles_kernel
Error	3	error C2065: '__INT_MAX__' : undeclared identifier	c:\dev\blender\blender\intern\cycles\kernel\kernel_random.h	146	1	cycles_kernel
Error	4	error C2065: '__INT_MAX__' : undeclared identifier	c:\dev\blender\blender\intern\cycles\kernel\kernel_path.h	1207	1	cycles_kernel
Error	5	error C2664: 'long _InterlockedCompareExchange(volatile long *,long,long)' : cannot convert argument 1 from 'volatile __int64 *' to 'volatile long *'	C:\Dev\Blender\blender\intern\cycles\kernel\kernel_compat_cpu.h	274	1	cycles_device
Error	6	error : identifier "__INT_MAX__" is undefined	c:\dev\blender\blender\intern\cycles\kernel\kernel_random.h	108	1	cycles_kernel_cuda
Error	7	error : identifier "__INT_MAX__" is undefined	c:\dev\blender\blender\intern\cycles\kernel\kernel_random.h	146	1	cycles_kernel_cuda
Error	9	error : identifier "__INT_MAX__" is undefined	c:\dev\blender\blender\intern\cycles\kernel\kernel_path.h	1207	1	cycles_kernel_cuda

Looked in the .patch and __ INT_MAX__ isn't defined anywhere.

Sorry, my fault. I forgot that INT_MAX is GCC specific, just replace it with 0xffffffff. Regarding the InterlockedCompareExchange: In atomicAddF, "volatile long long int" needs to be replaced by "volatile long int".

In progressive render mode, the render images just get more and more transparent, even when saved out as pngs. This also affects the viewport render, and isn't limited to Metro, but to all renders.

Progressive render - mostly transparent (10 samples)

Tiled render - how it should look (10 samples)

Here's a link to v.20 VC2013 with CUDA and GPU SSS/Volumes. Everything should be included in the zip this time!

Any recommendations to settings this scene?

To add more detail to Jemonn's bug report, the increasing transparency bug happens when using progressive Metropolis sampling, Tiled Metropolis sampling and adaptive sampling is unaffected.

Also, having the Importance Equalisation sampling option checked now crashes Blender in all cases.

OK, I found the viewport bug, it was quite an easy fix. Also included are the MSVC fixes from my last post.
@lopata (lopataasdf) From what I can see, the problem isn't actually the warmup, but the brightness contrast. If you look at the Villa rendering I uploaded, the tiles containing the lamps also have quite a high error since the samples are directed to where the image is brighter. The fix for this would be the importance equalisation, but as I said, it's currently broken. Just for safety, in this bugfix version, I also removed the UI element, since it crashes anyways.

On the 20.1 patch, got an issue with GPU Progressive mode. CPU Prog works fine, as does GPU and CPU tiled.

GPU Prog:

GPU Tiled:

GPU Prog (smaller render dimensions):

All settings were at their defaults. I also tried changing the number of metro chains and progressive samples and render samples with no change.

Well, I've been testing the adaptive progressive sampling mode on a scene of mine that has a region covered in a volume and a bit of indirect lighting.

In theory, the adaptive sampling mode should lead to a massive decrease in rendertimes, in practice though...

Now as I said, it starts out promising as it immediately starts to work on areas that are more difficult to sample, this means that these areas start to converge a lot faster than in vanilla Cycles builds. However, hours in, I notice the advantage of the adaptive system start to disappear and then become a disadvantage.

Now you might wonder how it becomes a disadvantage then, well, after an hour or so, the convergence rate in some notable areas slow down so dramatically that it actually becomes less converged after so many hours than if you just continually threw samples at it like in the vanilla Master builds. It's almost like the sampler gets stuck on tiles in more indirectly lit areas and rarely get around to adding more samples to areas that you know needs more sampling the most from a visual perspective.

This is the image of the scene that I'm talking about (rendered in a vanilla Cycles build).

The biggest issue with what the adaptive sampler does is in in the blue region, it's the first area to get additional samples after the initial pass, but later on the convergence rate in that area comes to a virtual standstill. I've tried to increase and decrease firefly emphasis using the exponent setting, I've tried adaptive steps from 10 to 50 samples, I've tried changing samplers, I just couldn't get convergence on the order of a notable increase in performance compared to what it otherwise would be.

So in all, I don't think the adaptive sampler might be ready for prime time yet, but I do anticipate that it will continue to be improved (eventually since you said yourself that you don't have a lot of time now).

Well, after a while, I found out how I could get much better convergence speed in the blue area of that image while maintaining the advantages of this patch, and that is to set the exponent as high as 50 while disabling the adaptive part of the sampling (as in, the actual shuffling of samples within the tile).

I noticed on another scene what appeared to be a major weakness of the adaptive part of the sampler, and that is in various cases, you get a good amount of clumpiness as to where you have spots with lots of samples interspersed with spots that have little. It seems to me in this case that the adaptive sampling code is a bit too tight in terms of focusing the samples as the clumping can also mess with the perceived error and cause s bit of inaccuracy compared to what is seen visually along with very slow convergence. I've also noticed then that true adaptivity in various cases cause inferior quality to if you just had the adaptive stopping and the prioritization of tiles based on error.

You might think what not having the in-tile adaptivity might do to convergence where the focusing of samples work well, it actually doesn't give that much of a disadvantage in terms of the big picture providing that the tiles are somewhat on the small side (I use a size that's 40 by 40 pixels).

A final note, I will tell you that Metropolis sampling still seems to crash on large scenes when Sobol distribution is used (doesn't crash with Multi-jitter), and that Sobol adaptive sampling may crash on large scenes after many hours of rendering (Multi-jitter just seems more stable overall).

Also, it would be nice if you could find the time to do periodic re-basing of the patch (if it doesn't build with new revisions) so we can at least have up to date builds (it shouldn't take too long unless there's massive changes to the Cycles code).

OK, I rebased it to 7b83e3d.
@Adam Friesen (ace_dragon) OK, that's strange. In all my test scenes, my "subjective" by-eye error estimation agreed quite well with the computed error. Could you attach/send the scene in the example (or any other scene showing this behaviour)?
The clumping with the in-tile adaptive mode is quite to be expected, unfortunately. The reason for it appearing is that sometimes a firefly appears in a single pixel, so now the error of this pixel is way higher than its neighbours. For this reason I added a smoothing step to the sampling, but if there is only one high-error pixel in the tile, you can't smooth enough to get rid of the error. This is a statistical problem: You can only calculate the variance (the basis of the current error estimation) within some confidence interval, but not exact: The variance itself is again an approximation.
Regarding the crashes: "It crashes" is pretty hard to debug, especially since for me it never crashes, even on scenes like the Lego bulldozer which I consider quite complex. COuld you please post your system data (mainly OS), build info (compiler, master version on which you applied the patch...) and (most importantly) the crashlog (if possible, generated by a debug build)? Also, does it crash randomly or always at the same tile? Could you post an example scene?

About the issue of fireflies messing with the error estimation, couldn't this be something that one could resolve by running a filter similar to what the compositor's despeckle node does before applying the smoothing?

Also, maybe any crashing is just me setting the exponent too high, because I do notice for one thing that the initial pass can start to get bogged down a bit if you do (left to wonder if it has a negahaven't tested though). A test render I'm currently doing also seems to do a much better job maintaining a constant noise level as well as accurate prioritization with the value lowered as well (as I know you said that you don't recommend values over 10).

That filtering idea is actually quite good! I think the despeckle node is basically a median filter, which usually excels in removing single-pixel noise. I'll look into that!
Regarding the high exponents: The problem here is numerical, since what happens is that it takes the p'th power of all the errors, sums them up and then takes the p'th root. However, even with a p of around 4, the sum is often around 0.00001, which already gives precision errors with floats. Considering that the loop is no bottleneck and only stores a single value, I'll change it to use doubles in the next version.

While looking at the render proces i got an idea that's maybe nice, as it might reduce render time, so i share the idea.
With adaptive rendering (i read from the threat this can be done on pixel scale as well), but that might not even be required for this idea.
Well i used camera with a focus and an empty as focus object / (zmap might work as well for this idea).

Now to the point a little noise doesn't have to be bad, real digital or analog camera's have some noise too .
However noise is more clear when its in the focus of your subject, as compared to something in the background.
So maybe its an idea to combine depth/camera focus into the error calculation of adaptive render-er.
What would be out of focus then would be a bit more noise, just as with real camera's (where that is often a bit blurred, that might be done in after processing in blender bokeh).

actually noise in cycles tends to be more visible in out of focus areas, but especially when something is really close to the camera. Also, post processing filter will never be able to achieve a good quality defocus for complex scenes. Believe me, I tried to squeeze the maximum of it in some projects, and while it is a great feature, it will never ever look close to what cycles does. Remember too, that actually in cycles, you don't get a consistent zbuffer because of the sampling, and you need this for the postprocessing. This exact feature you describe is in blender internal, which enables you lower sample numbers in out-of focus and fast-motion areas, exactly for the reason you write, but that counts with the postpro. Sorry if getting off-topic here.

This comment was removed by Ellwood Zwovic (gandalf3).

Okay, I'm back again, and with news: Since the Metropolis redesign helped separating the adaptive and Metro parts, I decided to split the patches and have now, for the last 3 weeks, polished the adaptive sampling to be ready for code review, for which I just submitted it at D808.
This means that for now, my main focus will be on getting adaptive sampling approved, but I'll also work on Metro in the meantime. I still don't like some parts of it, especially the sometimes ridiculously high required warmup samples and how inefficient GPUs are at it (this might be improved by sampling the first dimensions and then sorting based on image coordinates to improve coherency, for example).
So, this patch is far from dead, and I'll continue working on it. Sorry, no up-to-date Metro patch yet, that will be the next step.

This comment was removed by Peter Boos (PGTART).

What happened to this patch?

This patch is very interesting - as i see they can transform cycles in to something like maxwell render and this is on cycles with gpu power - very much hope that the developers are paying due attention. Maybe there are some news about it?

Well, this is so sad. This could change the entire Blender project. Hope this come back to life.

Well this now is dead, no hope.

For all people who interested in this, here are some quotes:

Lukas Stockner wrote:
"I decided to split the patches and have now, for the last 3 weeks, polished the adaptive sampling to be ready for code review, for which I just submitted it at D808. This means that for now, my main focus will be on getting adaptive sampling approved"

And quote from D808, most recent message (Aug 18 2015) by Lukas Stockner:

"There was quite a long silence around this patch, so here's an update and the way I see it:
This approach kind of works, but not nearly as good as it should. It's not robust at all,
sometimes it just gets stuck on single pixels, sometimes the result is quite blotchy, sometimes
tile borders are extremely visible. I'm looking into many different approaches currently and it looks like there is a lot of protential. Cycles' design makes some of them impossible, but still, a big improvement should be possible. Parts of this patch will be reusable, but the approach of just taking the variance is too unstable.

Since that message just about two months have passed which is not too long considering how hard and complicated the task at hand is, so it is way too soon to say things like "there is no hope", "it is dead", etc. This is not helpful. In case some of you do not know: Lukas already did some great work which was finished and accepted to Blender - light portals, not to mention he added other new features and fixed some issues (date of his most recent patch committed to master is 2015-10-08).

I believe there is still a chance that metropolis sampling may be added to Blender eventually.

There is hope. It is alive.

Well, light portals are kind of useless due to the poor skylight code. It's not physically plausible nor realistic.

Portals are very useful, and in many cases they make it easier to get plausible and realistic result with Cycles by reducing render time without loss of quality. I was impressed by great speed up for some of my own scenes. But this is off-topic here. My original point was that the developer is still active and already made many useful contribution to Blender. Saying that developer's work is "useless" just because some other code is not yet good enough for you is unlikely to motivate anyone to work faster. Before that you said that "this now is dead, no hope" just three weeks after the developer said "I'm looking into many different approaches currently and it looks like there is a lot of potential" in another thread. You have to understand that if you want something to be done fast(er), you either have to hire a developer or do it yourself, otherwise you have to wait and hope. Complaining many times in a row and spreading your negativity is not going to help anyone.

Inverse psychology.

You shouldn't pay attention to these kind of comments. But people cannot avoid these so, this was my way to get an answer. Thank you Boris.

I very much hope that this method will appear in the next versions. It is a good method if you do things with light and kauystikoy. Portals are very helpful. Thank you for making improvements. (Google translator)

ok, i see everyone using this but how do i add it to cycles myself?

@Julien Peeters (AssAsIN10556) Using this requires to apply D808 and build blender yourself.

@Lukas Stockner (lukasstockner97) It seems that the patch fails. Either building or applying.

I can make Windows 64 bit build if the patch works.

@Julien Peeters (AssAsIN10556) Using this requires to apply D808 and build blender yourself.

@Lukas Stockner (lukasstockner97) It seems that the patch fails. Either building or applying.

I can make Windows 64 bit build if the patch works.

D808? i have no clue what that should be... could you explain it? or send a link with an explenation

D808 is a link to the patch. Just click it. Or just go To grab the code. However like I said the patch no longer works and needs to be updated.

D808 is a link to the patch. Just click it. Or just go To grab the code. However like I said the patch no longer works and needs to be updated

hmm... are there any older patches? that would work? (with version 2.76 maybe? lets hope so)

@Lukas Stockner (lukasstockner97) can we close this? Since this will be a patch at some point I can't see why to have this as a maniphest task meanwhile.

Lukas Stockner (lukasstockner97) changed the task status from Unknown Status to Unknown Status.Jan 4 2017, 2:06 PM

Yes, that's definitely a good idea - even if this was revived at some point, it would be a completely rewritten patch anyways.