Design Doc: Sample based compositor #53790

New Issue

Jeroen Bakker · 2018-01-15T14:18:13+01:00

Jeroen Bakker commented

2018-01-15 14:18:13 +01:00

This is a design doc for the compositor.

Principles
Sample based: When using samples. With number of samples the artist can switch between speed and quality at any given moment. When speed is needed the artist can lower the number of samples and fast feedback. When actually rendering the number of samples can be increased for better result.
Relative: Currently changing resolution (or percentage) will effect the working of several nodes even default settings of the Blur node needs to be adjusted. With Relative all parameters should be aware of the resolution it is calculating in.
PixelSize aware: Currently the compositor is fixed to the perspective and ortograhic camera models. When using the Panaroma camera’s the blurs are not accurate. In cases like dome rendering, VR/AR a lot of trickery and working arounds are needed in order to composite correctly. When the Compositor takes the actual camera data of the scene (or image) into account this the compositor can calculate more accurate in these cases.
Canvas: Being able to put images in the compositor and align/transform it visually.
GPU support: The current compositor does only support GPU for a certain number of nodes. Other nodes are calculated on the CPU; huge amount of data is loaded/unloaded what takes a lot of resources. The new design should be able to run fully on the GPU.

Samples
In the sample based compositor the X,Y coordinate of the output image is transformed to a Ray (Position, Direction, UpVector, samplesize). This ray is evaluated to the node tree. When the ray is at an input node (Image, RenderLayer) the given ray is transformed into the specific image space. The image will then be samples to a result color/value. This result is then passed back to the node who requested it.
Nodes will be able to create alterations to these ray. And select which input socket will receive which ray. For example a blur node can ‘bend’ the incoming ray (of change the samplesize).
As samples are slightly randomized, every sample will sample different part of the same pixel, what will lead to sub pixel sampling. This leads to very crispy images compare to the current compositor.

Current compositor

Sample based compositing

Viewports and filtering
By all buffers, like input/output images, renderlayers nodes the user will be able to select aviewport what identifies where the image is in the scene and with what kind of camera it was created with (eg plane vs spherical). These viewports can be added in the 3d scene. But for canvas compositing we will allow these viewports to be visible in the backdrop of the node editor (feat: canvas compositing).

Using viewports it will be easier to composite planes into your scene when the camera is moving.
Also at all input image/renderlayer nodes the filter (nearest, linear, cubic, smart and others) and clipping (Clip, Extend, Repeat) can be selected that will be used when sampling this image.

Open questions
Normally compositors are image based and many algorithms are also created for image based compositors. As this design is totally different there are also risks like are we able to implement all current features into this new architecture. IMO we will get to 95% of the old features. But some features are very hard to implement (or will come with huge penalties). Should we implement this as a second compositor and let the user decide which compositor to use?

Next steps
Discuss/Approve this design
Find support for this design

Resources
Bitbucket source repository: https://bitbucket.org/atmind/blender-compositor-2016 branch compositor-2016
Nodes that have implementation (not guaranteed 100% same):
Viewer, RGB, Value, Mix, AlphaOver, Image (no OpenEXR), Render layers, Movieclip, Blur (only relative and bokeh), Color Matte, Chroma matte, Math, Value to Color, Value to Vector, Vector to Value, Color to Value, Color to Vector, RGB to BW, Separate RGBA/HSVA/YUVA, Combine RGBA/HSVA/YUVA, Hue Sat, Bright contrast, Gamma, Color balance, Color spill, Set Alpha, Channel matte, Difference Matte, Distance Matte, Luma Matte
Note: still very WIP. eg crash-a-lot it is intended to be a proof of concept.

Currently the rays have no width, hence many samples are needed when blurring. In the actual implementation I want to create Cone shaped rays, what will need less sampling to get to better results. Difficulty is the need for a mask to support elliptical blurs

Technical design doc: https://docs.google.com/document/d/1L-BWs_XbX_BgZ0F0vJC9-fxbM6KWQyrIO6qFEOfPSXA/edit#heading=h.ontenmvswzw8

This is a design doc for the compositor. **Principles** *Sample based*: When using samples. With number of samples the artist can switch between speed and quality at any given moment. When speed is needed the artist can lower the number of samples and fast feedback. When actually rendering the number of samples can be increased for better result. *Relative*: Currently changing resolution (or percentage) will effect the working of several nodes even default settings of the Blur node needs to be adjusted. With Relative all parameters should be aware of the resolution it is calculating in. *PixelSize aware*: Currently the compositor is fixed to the perspective and ortograhic camera models. When using the Panaroma camera’s the blurs are not accurate. In cases like dome rendering, VR/AR a lot of trickery and working arounds are needed in order to composite correctly. When the Compositor takes the actual camera data of the scene (or image) into account this the compositor can calculate more accurate in these cases. *Canvas*: Being able to put images in the compositor and align/transform it visually. *GPU support*: The current compositor does only support GPU for a certain number of nodes. Other nodes are calculated on the CPU; huge amount of data is loaded/unloaded what takes a lot of resources. The new design should be able to run fully on the GPU. **Samples** In the sample based compositor the X,Y coordinate of the output image is transformed to a Ray (Position, Direction, UpVector, samplesize). This ray is evaluated to the node tree. When the ray is at an input node (Image, RenderLayer) the given ray is transformed into the specific image space. The image will then be samples to a result color/value. This result is then passed back to the node who requested it. Nodes will be able to create alterations to these ray. And select which input socket will receive which ray. For example a blur node can ‘bend’ the incoming ray (of change the samplesize). As samples are slightly randomized, every sample will sample different part of the same pixel, what will lead to sub pixel sampling. This leads to very crispy images compare to the current compositor. Current compositor ![Screenshot from 2018-01-15 13-54-54.png](https://archive.blender.org/developer/F1837621/Screenshot_from_2018-01-15_13-54-54.png) Sample based compositing ![Screenshot from 2018-01-15 13-54-08.png](https://archive.blender.org/developer/F1837612/Screenshot_from_2018-01-15_13-54-08.png) **Viewports and filtering** By all buffers, like input/output images, renderlayers nodes the user will be able to select aviewport what identifies where the image is in the scene and with what kind of camera it was created with (eg plane vs spherical). These viewports can be added in the 3d scene. But for canvas compositing we will allow these viewports to be visible in the backdrop of the node editor (feat: canvas compositing). ![Screenshot from 2018-01-15 13-52-26.png](https://archive.blender.org/developer/F1837599/Screenshot_from_2018-01-15_13-52-26.png) Using viewports it will be easier to composite planes into your scene when the camera is moving. Also at all input image/renderlayer nodes the filter (nearest, linear, cubic, smart and others) and clipping (Clip, Extend, Repeat) can be selected that will be used when sampling this image. **Open questions** Normally compositors are image based and many algorithms are also created for image based compositors. As this design is totally different there are also risks like are we able to implement all current features into this new architecture. IMO we will get to 95% of the old features. But some features are very hard to implement (or will come with huge penalties). Should we implement this as a second compositor and let the user decide which compositor to use? **Next steps** Discuss/Approve this design Find support for this design **Resources** *Bitbucket source repository*: https://bitbucket.org/atmind/blender-compositor-2016 branch compositor-2016 Nodes that have implementation (not guaranteed 100% same): Viewer, RGB, Value, Mix, AlphaOver, Image (no OpenEXR), Render layers, Movieclip, Blur (only relative and bokeh), Color Matte, Chroma matte, Math, Value to Color, Value to Vector, Vector to Value, Color to Value, Color to Vector, RGB to BW, Separate RGBA/HSVA/YUVA, Combine RGBA/HSVA/YUVA, Hue Sat, Bright contrast, Gamma, Color balance, Color spill, Set Alpha, Channel matte, Difference Matte, Distance Matte, Luma Matte Note: still very WIP. eg crash-a-lot it is intended to be a proof of concept. Currently the rays have no width, hence many samples are needed when blurring. In the actual implementation I want to create Cone shaped rays, what will need less sampling to get to better results. Difficulty is the need for a mask to support elliptical blurs *Technical design doc*: https://docs.google.com/document/d/1L-BWs_XbX_BgZ0F0vJC9-fxbM6KWQyrIO6qFEOfPSXA/edit#heading=h.ontenmvswzw8

Jeroen Bakker commented

2018-01-15 14:18:13 +01:00

Added subscriber: @Jeroen-Bakker

Michael P. commented

2018-01-15 23:15:42 +01:00

Added subscriber: @MichaelParucha

Michael P. commented

2018-01-15 23:28:24 +01:00

So the user can add a viewport "object" to the scene, which is a image buffer "dump":

buffer of a input node (image, movie, etc.)?
buffer of a texture node?
buffer of a output node, i.e. a different compositing output?
buffer of another scene (OpenGL / Eevee / Cycles Render)?

And then move, rotate and scale them in x, y, (z) and maybe use array modifiers?
In the compositor the user sees them through the active camera?

What are the "features very hard to implement"?

Thank you.

So the user can add a viewport "object" to the scene, which is a image buffer "dump": - buffer of a input node (image, movie, etc.)? - buffer of a texture node? - buffer of a output node, i.e. a different compositing output? - buffer of another scene (OpenGL / Eevee / Cycles Render)? And then move, rotate and scale them in x, y, (z) and maybe use array modifiers? In the compositor the user sees them through the active camera? What are the "features very hard to implement"? Thank you.

Brecht Van Lommel commented

2018-01-16 04:45:59 +01:00

Added subscriber: @brecht

Brecht Van Lommel commented

2018-01-16 04:45:59 +01:00

So if I understand this correctly, the idea is to handle one sample at a time, using Monte Carlo sampling for operations like blur. The advantage of that is that it simplifies the implementation, as every sample can be handled individually, in a single kernel execution. As a result it may also be easier to optimize than a more complicated implementation that theoretically could be faster but isn't. It also naturally provides a progressive preview. "Monte Carlo compositing" is a very interesting idea and I hadn't realised this was the plan.

However, I have major doubts about this approach in practice. Hopefully I'll be proven wrong, but I see these problems:

Just like path tracing, there will be slow convergence, noise and fireflies. Monte Carlo sampling is notoriously inefficient, especially in high dimensions (every blur-like node adds a dimension like a bounce in path tracing), and mostly it is used in rendering because it is the least bad solution. With importance sampling, adaptive sampling and QMC results can be improved, but this is expensive as well, and in general does not save you from the slow 1/sqrt(N) convergence.
With the graph flattened into a tree, nodes with outputs linked to multiple nodes will be computed multiple times. How much it slows things down depends on the specific graph, I don't have a good estimate how that would work out in production setups.
Another extra cost is that noise-free operations will now be repeated for each sample, whereas otherwise they could be executed only once.
Memory access is usually the number one bottleneck on modern CPUs and GPUs. If you add a blur node, any nodes feeding into that will now have incoherent memory access to random parts of the buffer. This will lead to cache misses on the CPU and non-coalesced memory access on the GPU. Processing all pixels in one node at a time isn't ideal either, with the entire image going in and out of the caches every time. And if the blur radius isn't too crazy high maybe the cache misses aren't too bad, but still I expect it to be far from optimal.
This is more a detail about the current implementation, but the way it works on the CPU seems to involve a lot of branching, in a way that I would expect to be about as expensive as the virtual function calls that we currently have. This branching would be avoided when working on blocks of pixels at a time, or dynamically generating OpenCL kernel code, both of which would also allow more SIMD optimization.
As you mention, some operations are impossible to implement in this framework. I'm guessing Fast Gaussian blur, Normalize, Inpaint, Vector Blur, Denoising, .. ?
And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly.

Here's a .blend to show demonstrate two problems. Note I haven't tried compiling the patch yet and testing this.

monte_carlo_compositing.blend

The ground has a checkerboard, which goes through a blur node followed by a RGB curves operation, which should give a different result than the current compositor due to bias.
Depth of field with sharp specular highlights in Monte Carlo path tracing is notoriously noisy, which is why it is often done in compositing instead. Now the compositor might have a similar problem, although it's not really the same because each individual sample is massively cheaper to compute since it doesn't involve any raytracing. The .blend contains an image with specular highlights and blur nodes, which I would expect to require many samples.

So if I understand this correctly, the idea is to handle one sample at a time, using Monte Carlo sampling for operations like blur. The advantage of that is that it simplifies the implementation, as every sample can be handled individually, in a single kernel execution. As a result it may also be easier to optimize than a more complicated implementation that theoretically could be faster but isn't. It also naturally provides a progressive preview. "Monte Carlo compositing" is a very interesting idea and I hadn't realised this was the plan. However, I have major doubts about this approach in practice. Hopefully I'll be proven wrong, but I see these problems: * Just like path tracing, there will be slow convergence, noise and fireflies. Monte Carlo sampling is notoriously inefficient, especially in high dimensions (every blur-like node adds a dimension like a bounce in path tracing), and mostly it is used in rendering because it is the least bad solution. With importance sampling, adaptive sampling and QMC results can be improved, but this is expensive as well, and in general does not save you from the slow `1/sqrt(N)` convergence. * With the graph flattened into a tree, nodes with outputs linked to multiple nodes will be computed multiple times. How much it slows things down depends on the specific graph, I don't have a good estimate how that would work out in production setups. * Another extra cost is that noise-free operations will now be repeated for each sample, whereas otherwise they could be executed only once. * Memory access is usually the number one bottleneck on modern CPUs and GPUs. If you add a blur node, any nodes feeding into that will now have incoherent memory access to random parts of the buffer. This will lead to cache misses on the CPU and non-coalesced memory access on the GPU. Processing all pixels in one node at a time isn't ideal either, with the entire image going in and out of the caches every time. And if the blur radius isn't too crazy high maybe the cache misses aren't too bad, but still I expect it to be far from optimal. * This is more a detail about the current implementation, but the way it works on the CPU seems to involve a lot of branching, in a way that I would expect to be about as expensive as the virtual function calls that we currently have. This branching would be avoided when working on blocks of pixels at a time, or dynamically generating OpenCL kernel code, both of which would also allow more SIMD optimization. * As you mention, some operations are impossible to implement in this framework. I'm guessing Fast Gaussian blur, Normalize, Inpaint, Vector Blur, Denoising, .. ? * And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly. Here's a .blend to show demonstrate two problems. Note I haven't tried compiling the patch yet and testing this. [monte_carlo_compositing.blend](https://archive.blender.org/developer/F1850350/monte_carlo_compositing.blend) * The ground has a checkerboard, which goes through a blur node followed by a RGB curves operation, which should give a different result than the current compositor due to bias. * Depth of field with sharp specular highlights in Monte Carlo path tracing is notoriously noisy, which is why it is often done in compositing instead. Now the compositor might have a similar problem, although it's not really the same because each individual sample is massively cheaper to compute since it doesn't involve any raytracing. The .blend contains an image with specular highlights and blur nodes, which I would expect to require many samples.

filip mond commented

2018-01-19 11:30:12 +01:00

Added subscriber: @FilipMond

david mcsween commented

2018-01-19 13:21:14 +01:00

Added subscriber: @davidmcsween

Sean Kennedy commented

2018-01-22 19:38:58 +01:00

Added subscriber: @SeanKennedy

Sofus Rose commented

2018-01-23 00:31:20 +01:00

Added subscriber: @Darkfie9825

Jeroen Bakker commented

2018-01-23 15:15:24 +01:00

Hi Brecht!

thanks for your feedback. We will align our plans with your comments and check what will be the best strategic and next steps are for the compositor.

Hi Brecht! thanks for your feedback. We will align our plans with your comments and check what will be the best strategic and next steps are for the compositor.

Erick Tukuniata commented

2018-01-23 15:29:54 +01:00

Added subscriber: @ErickNyanduKabongo

Frank Meier commented

2018-02-07 19:25:18 +01:00

Added subscriber: @frank.meier

Frank Meier commented

2018-02-07 19:25:18 +01:00

I think you missed an important point regarding the editing performance:

As documented here (section "Keep Buffers", https://wiki.blender.org/index.php/Dev:Ref/Proposals/Compositor),
the main reason why the current compositor is so slow while editing nodes, is because it generates the full node tree after each edit (e.g. even if only the last node before the viewer has been modified).

This proposol does not solve that more general problem. The sample-based progressive mode will lead to faster feedback, but to me it seems like this will scale poorly. Since the whole tree is regenerated each time, when the node tree becomes more complex, less samples can be calculated to maintain the same level of feedback. And less samples means lower quality of the displayed result. In consequence the user then may have to wait a short moment until assessing the result becomes possible. And that would kind of ruin the idea of fast feedback.

As already described in the wiki proposal, a possible solution is to introduce smart caching of intermediate results (like the result of a whole tree branch or all inputs of the node currently being edited), as well as introducing the possibilty to work in a lower resolution. Both would make sense for a pure GPU compositing engine, as well, I guess. While fast feedback is important, it is also important that the final result is available as fast as possible.

I think you missed an important point regarding the editing performance: As documented here (section "Keep Buffers", https://wiki.blender.org/index.php/Dev:Ref/Proposals/Compositor), the main reason why the current compositor is so slow while editing nodes, is because it generates the full node tree after each edit (e.g. even if only the last node before the viewer has been modified). This proposol does not solve that more general problem. The sample-based progressive mode will lead to faster feedback, but to me it seems like this will scale poorly. Since the whole tree is regenerated each time, when the node tree becomes more complex, less samples can be calculated to maintain the same level of feedback. And less samples means lower quality of the displayed result. In consequence the user then may have to wait a short moment until assessing the result becomes possible. And that would kind of ruin the idea of fast feedback. As already described in the wiki proposal, a possible solution is to introduce smart caching of intermediate results (like the result of a whole tree branch or all inputs of the node currently being edited), as well as introducing the possibilty to work in a lower resolution. Both would make sense for a pure GPU compositing engine, as well, I guess. While fast feedback is important, it is also important that the final result is available as fast as possible.

Pathan Amrulla Khan commented

2018-03-16 04:56:34 +01:00

Added subscriber: @PathanAmrullaKhan

Rainer Trummer commented

2018-08-24 21:41:37 +02:00

Added subscriber: @RainerTrummer

Ivo van der Lans commented

2019-08-07 23:18:11 +02:00

Added subscriber: @Ivo

Ivo van der Lans commented

2019-08-07 23:18:11 +02:00

With recent advances in Monte-Carlo ray tracing (and specifically the VCM type of renderers), I suggest to use a **bi-**directional path tracer (with? MIS) instead of the currently proposed forward path tracer. Or at the very least, implement a reverse path tracer where rays start at the source images (the emitter) where important samples (highlights) are easily identified before tracing commences (also enabling Metropolis tracing). This is all the more efficient since users will typically blur the input a lot more than they will want to sharpen the output.

Furthermore, in contrast with Monte-Carlo in 3D rendering, compositing can have deterministic traversal of all paths (unfortunately preventing the beautiful crispy antialiasing). Shuffling (hashing) all possible bends and then progressively processing them will appear pseudo-random, but converges to the ground truth after at most #pixels^M paths with M being the length of the longest chain through the node-tree. No additional work is done compared to the traditional compositor. Also, a node that performs one of the impossible operations should trigger a ‘full render’ up until that node (which has deterministic time-complexity).

Yet another advantage is that when node N (having multiple connected outputs) has completed some reverse paths, with N being the length of the longest chain through the node-tree up until that node (N <= M), the intermediate result can already be buffered.

This also works for the forward paths at node K from the camera (the sensor).
When a user interrupts the compositor, it can resume from those buffers.
When a user modifies a node somewhere in the middle, the buffers at node N and K are still valid and compositing can continue thereafter (clearing only the buffers between nodes N and K and the ones between each non-linear node and N and K).
Buffering is not needed for nodes that have a single connected output.

Level-of-detail:

Mipmapping the buffers (a strategy akin to the aforementioned cone-tracing) could yield a significant performance gain for blur-like nodes with high radius in return for some loss-of-precision; a node-tree-analyzer could decide to insert additional buffers at the inputs to those blur-like nodes. Remember that mipmap addressing reduces to bit-shifting the address when the pixels are ordered along a Hilbert or Lebesgue space-filling curve.
One can favor SIMD and GPU-like processors by working in (say) 4x4 pixel-groups for more coherent memory access; again the space-filling curves helps with addressing here. Applying pixel-grouping up until but not including the output-image prevents blockiness in the viewport. In a further micro-optimization, we only compute a single (average) bend per pixel-group which translates as mipmapping on the ray bending,
MIS might help to balance between methods 1 and 2.

With recent advances in Monte-Carlo ray tracing (and specifically the [VCM](http://www.smallvcm.com) type of renderers), I suggest to use a **bi-**directional path tracer (with? MIS) instead of the currently proposed **forward** path tracer. Or at the very least, implement a **reverse** path tracer where rays start at the source images (the emitter) where important samples (highlights) are easily identified before tracing commences (also enabling Metropolis tracing). This is all the more efficient since users will typically blur the input a lot more than they will want to sharpen the output. Furthermore, in contrast with Monte-Carlo in 3D rendering, compositing can have deterministic traversal of all paths (unfortunately preventing the beautiful *crispy* antialiasing). Shuffling (hashing) all possible *bends* and then progressively processing them will appear pseudo-random, but converges to the ground truth after at most #pixels^M paths with M being the length of the longest chain through the node-tree. No additional work is done compared to the traditional compositor. Also, a node that performs one of the *impossible* operations should trigger a ‘full render’ up until that node (which has deterministic time-complexity). Yet another advantage is that when node N (having multiple connected outputs) has completed some *reverse* paths, with N being the length of the longest chain through the node-tree up until that node (N <= M), the intermediate result can already be buffered. - This also works for the *forward* paths at node K from the camera (the sensor). - When a user *interrupts* the compositor, it can resume from those buffers. - When a user *modifies* a node somewhere in the middle, the buffers at node N and K are still valid and compositing can continue thereafter (clearing only the buffers *between* nodes N and K and the ones between each non-linear node and N and K). - Buffering is not needed for nodes that have a single connected output. Level-of-detail: 1. Mipmapping the buffers (a strategy akin to the aforementioned cone-tracing) could yield a significant performance gain for blur-like nodes with high radius in return for some loss-of-precision; a node-tree-analyzer could decide to insert additional buffers at the inputs to those blur-like nodes. Remember that mipmap addressing reduces to bit-shifting the address when the pixels are ordered along a Hilbert or Lebesgue space-filling curve. 2. One can favor SIMD and GPU-like processors by working in (say) 4x4 pixel-groups for more coherent memory access; again the space-filling curves helps with addressing here. Applying pixel-grouping up until but not including the output-image prevents blockiness in the viewport. In a further micro-optimization, we only compute a single (average) bend per pixel-group which translates as mipmapping on the ray bending, 3. MIS might help to balance between methods 1 and 2.

Rainer Trummer commented

2019-08-08 06:17:33 +02:00

@Ivo please note this proposal is about the compositor, not cycles.

Fahad Hasan commented

2019-10-05 15:40:38 +02:00

Added subscriber: @MD.FahadHassan

Fahad Hasan commented

2019-10-05 23:24:43 +02:00

@Jeroen-Bakker Hello. I could not find any medium to connect with you. This proposal seems more relevant. Please feel free to ignore if it's not relevant.

I was trying to make this project alive again but have failed miserably:
https://github.com/bitsawer/blender-custom-nodes/issues/4

This project can open up the whole new world for Blender Compositor. As GLSL is fairly easy to grab and we have a great community, within a year Blender Compositor can have mostly all types of node users need. Also it was very fast in my experience.
I had opened this diff for 2.8 in hope someone will bug this out. It doesn't work as Blender 2.8 GPU library has been changed I guess. I am not familiar with Blender C codes, so giving it here in case if someone wants to pick it up.

Also this project has a pynode. Which can open up AI integration with python as well.

Thanks.

@Jeroen-Bakker Hello. I could not find any medium to connect with you. This proposal seems more relevant. Please feel free to ignore if it's not relevant. I was trying to make this project alive again but have failed miserably: https://github.com/bitsawer/blender-custom-nodes/issues/4 This project can open up the whole new world for Blender Compositor. As GLSL is fairly easy to grab and we have a great community, within a year Blender Compositor can have mostly all types of node users need. Also it was very fast in my experience. I had opened this diff for 2.8 in hope someone will bug this out. It doesn't work as Blender 2.8 GPU library has been changed I guess. I am not familiar with Blender C codes, so giving it here in case if someone wants to pick it up. Also this project has a pynode. Which can open up AI integration with python as well. Thanks.

gabriel montagné commented

2019-12-24 19:44:47 +01:00

Added subscriber: @GabrielMontagne

Aditia A. Pratama commented

2021-01-23 17:19:07 +01:00

Added subscriber: @AditiaA.Pratama

Büke Beyond commented

2021-01-23 20:39:00 +01:00

Added subscriber: @BukeBeyond

Büke Beyond commented

2021-01-23 20:39:00 +01:00

You've got my attention with "Cone shaped rays". Elaborate please.

Büke Beyond commented

2021-01-23 20:58:31 +01:00

I have cone shaped fast ray tracing for UI, fast enough to run real time and mostly in c++ to free up the gpu for content rendering. Low frequency management is the key to speed. It is trivial to sample a simple procedural environment with a growing cone, but for arbitrary geometry and occlusion, I am still working on a solution.

These bevels are generated and rendered in real time, and scroll smoothly to environment:

I think the days of polygons for 3d are numbered. We are still stuck in the ascii art, vectors for 2d images, now for 3d. I think the future is sampled voxels, procedural sdf, ray marching, 3d fft. GI and Diffuse, low frequency, cone sampling are essential, and current algorithms slow down with lower spacial frequencies, where it should be the other way around...

I have cone shaped fast ray tracing for UI, fast enough to run real time and mostly in c++ to free up the gpu for content rendering. Low frequency management is the key to speed. It is trivial to sample a simple procedural environment with a growing cone, but for arbitrary geometry and occlusion, I am still working on a solution. These bevels are generated and rendered in real time, and scroll smoothly to environment: ![image.png](https://archive.blender.org/developer/F9594222/image.png) ![image.png](https://archive.blender.org/developer/F9594224/image.png) I think the days of polygons for 3d are numbered. We are still stuck in the ascii art, vectors for 2d images, now for 3d. I think the future is sampled voxels, procedural sdf, ray marching, 3d fft. GI and Diffuse, low frequency, cone sampling are essential, and current algorithms slow down with lower spacial frequencies, where it should be the other way around...

Brecht Van Lommel commented

2021-01-23 23:40:24 +01:00

@BukeBeyond, that's unrelated to this task, please use devtalk.blender.org for general discussion.

Büke Beyond commented

2021-01-23 23:49:13 +01:00

Brecht, the author Jeroen Bakker: "Currently the rays have no width, hence many samples are needed when blurring. In the actual implementation I want to create Cone shaped rays, what will need less sampling to get to better results. Difficulty is the need for a mask to support elliptical blurs". It's right up there.

Brecht Van Lommel commented

2021-01-24 00:37:48 +01:00

Ok, it's somewhat related even though I don't think the algorithm you used would be applicable to compositing images.

Anyway, if it wasn't clear from my earlier comment, I think the idea of a Monte Carlo sampling based compositor is fundamentally flawed, both in terms of performance and correctness. And so personally I would just close this task.

Ok, it's somewhat related even though I don't think the algorithm you used would be applicable to compositing images. Anyway, if it wasn't clear from my earlier comment, I think the idea of a Monte Carlo sampling based compositor is fundamentally flawed, both in terms of performance and correctness. And so personally I would just close this task.

Büke Beyond commented

2021-01-24 01:24:07 +01:00

We agree on something. Monte Carlo sampling is a step above brute force, and we know those are the lowest ranking of algorithms.

In practice, when sampling textures on surfaces, we already use a form of circular sampling with mip maps (pyramid filtering). We could do monte carlo there, or use a jacobian sub pixel grid with 25 samples, but single sampling at a lower frequency (blur), the radius derived from dx and dy max, yields excellent results, even when anisotropy (elliptical) is ignored.

How do we take this to the 3D level? How do we for example, query SDF functions with alpha like density. If a 3d scene can quickly provide itself at a lower frequency either procedurally or with voxel octrees, we are closer to a solution, blazingly faster for diffuse, gi, dof, than monte carlo.

Just because a project is open source does not mean it should not be exploring the frontiers of computer science. I think the opposite it true.

We agree on something. Monte Carlo sampling is a step above brute force, and we know those are the lowest ranking of algorithms. In practice, when sampling textures on surfaces, we already use a form of circular sampling with mip maps (pyramid filtering). We could do monte carlo there, or use a jacobian sub pixel grid with 25 samples, but single sampling at a lower frequency (blur), the radius derived from dx and dy max, yields excellent results, even when anisotropy (elliptical) is ignored. How do we take this to the 3D level? How do we for example, query SDF functions with alpha like density. If a 3d scene can quickly provide itself at a lower frequency either procedurally or with voxel octrees, we are closer to a solution, blazingly faster for diffuse, gi, dof, than monte carlo. Just because a project is open source does not mean it should not be exploring the frontiers of computer science. I think the opposite it true.

Pipeliner commented

2021-01-25 12:53:03 +01:00

Added subscriber: @Pipeliner

Büke Beyond commented

2021-01-26 07:25:51 +01:00

For compositing in 2D and dynamic radii sampling, for blur/sharpen, morphology, image repair, etc, I've had the best results with Spiral sampling. For constant large radius, I use hybrid recursive FIR filters and pyramid (mipmap) in compute shaders for the fastest results with unlimited radii. I can probably extend the second method to dynamic radii by interpolating filter coefficients.

I just thought about taking Spiral sampling to 3d ray tracing. To verify the idea, I also found a paper:
https://core.ac.uk/download/pdf/208493108.pdf

Brecht what method do you use in Cycles?

For compositing in 2D and dynamic radii sampling, for blur/sharpen, morphology, image repair, etc, I've had the best results with Spiral sampling. For constant large radius, I use hybrid recursive FIR filters and pyramid (mipmap) in compute shaders for the fastest results with unlimited radii. I can probably extend the second method to dynamic radii by interpolating filter coefficients. I just thought about taking Spiral sampling to 3d ray tracing. To verify the idea, I also found a paper: https://core.ac.uk/download/pdf/208493108.pdf Brecht what method do you use in Cycles?

Bent Hillerkus commented

2021-01-26 14:12:00 +01:00

Added subscriber: @bent

gabriel montagné commented

2021-01-26 17:13:28 +01:00

Removed subscriber: @GabrielMontagne

filip mond commented

2021-01-29 13:35:19 +01:00

Removed subscriber: @FilipMond

Jacob Merrill commented

2021-06-27 07:56:26 +02:00

Added subscriber: @JacobMerrill-1

Jacob Merrill commented

2021-06-27 07:56:26 +02:00

why not have the composition nodes generate a openGL 2d filter stack and later a vulkan one?

eevee shader graph is basically generating a fragment shader right now.

this lends itself quite well to real time, and can be applied vs cycles renders as well after they converge.

a 'cpu only' version could be a fallback.

why not have the composition nodes generate a openGL 2d filter stack and later a vulkan one? eevee shader graph is basically generating a fragment shader right now. this lends itself quite well to real time, and can be applied vs cycles renders as well after they converge. a 'cpu only' version could be a fallback.

Bent Hillerkus commented

2021-06-27 16:47:14 +02:00

In #53790#1183272, @JacobMerrill-1 wrote:
why not have the composition nodes generate a openGL 2d filter stack and later a vulkan one?

eevee shader graph is basically generating a fragment shader right now.

this lends itself quite well to real time, and can be applied vs cycles renders as well after they converge.

a 'cpu only' version could be a fallback.

This is planned for the real time compositor coming at some point

> In #53790#1183272, @JacobMerrill-1 wrote: > why not have the composition nodes generate a openGL 2d filter stack and later a vulkan one? > > eevee shader graph is basically generating a fragment shader right now. > > this lends itself quite well to real time, and can be applied vs cycles renders as well after they converge. > > a 'cpu only' version could be a fallback. This is planned for the real time compositor coming at some point

Mindinsomnia commented

2021-06-27 18:42:58 +02:00

Added subscribers: @BrechtCorbeel, @Grady

Mindinsomnia commented

2021-06-27 18:42:58 +02:00

Monte Carlo compositing? Just because, as an idea, it's so 'different', I just want to contribute, there is a way this could work and address one of the flaws @BrechtCorbeel pointed out.

And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly.

Problem:

So lets say you have a node tree that looks somewhat like this:

In this example, imagine Checkerbox image is a checkerbox image of white (255,255,255) and black (0,0,0) pixels.

Under the original proposed concept, each sample the blur node would pick one random pixel, which would result in either black or white, and that would get fed into the color ramp (which produces black for white or black pixels) and the result would be an entirely black image, no white what so ever, whereas the result should be a blurred checkerbox image, aka grey pixels, which get converted to white pixels by the colour ramp.

Solution: Monte Carlo the nodes, not the result.

One possible solution is to use Monte Carlo but to apply it individually to each node in the compositing node tree, to store inside of each node a 'progressive' result that improves with each sample of the node tree.

Each iteration, Blur takes the checkerbox image and applies a '1 sample blur' filter to it, but rather than feeding that result immediately into the ColorRamp, it's actually just progressively combined with an internal result stored 'inside of the Blur node'. Hence, with each sample, the result being stored in 'Blur' is progressively converging to a blurred image..

After the first sample, the Blur node would be storing a result that is just noise, black and white pixels.
After say 8 samples, it would still be a noisy image, but it would be approaching middle grey values.

What ColorRamp receives is the progressively refining result stored in the Blur node.

That way as the blur node accumulates samples, the result stored in the blur node and being fed into the ColorRamp, is actually converging correctly to the right result. So you would see a 'white' result correctly.

Monte Carlo compositing? Just because, as an idea, it's so 'different', I just want to contribute, there is a way this could work and address one of the flaws @BrechtCorbeel pointed out. > And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly. **Problem:** So lets say you have a node tree that looks somewhat like this: ![blender_sNeUufiCy0.png](https://archive.blender.org/developer/F10203933/blender_sNeUufiCy0.png) In this example, imagine Checkerbox image is a checkerbox image of white (255,255,255) and black (0,0,0) pixels. Under the original proposed concept, each sample the blur node would pick one random pixel, which would result in either black or white, and that would get fed into the color ramp (which produces black for white or black pixels) and the result would be an entirely black image, no white what so ever, whereas the result should be a blurred checkerbox image, aka grey pixels, which get converted to white pixels by the colour ramp. **Solution: Monte Carlo the nodes, not the result.** One possible solution is to use Monte Carlo but to apply it individually to each node in the compositing node tree, to store inside of each node a 'progressive' result that improves with each sample of the node tree. Each iteration, Blur takes the checkerbox image and applies a '1 sample blur' filter to it, but rather than feeding that result immediately into the ColorRamp, it's actually just progressively combined with an internal result stored 'inside of the Blur node'. Hence, with each sample, the result being stored in 'Blur' is progressively converging to a blurred image.. After the first sample, the Blur node would be storing a result that is just noise, black and white pixels. After say 8 samples, it would still be a noisy image, but it would be approaching middle grey values. What ColorRamp receives is the progressively refining result stored in the Blur node. That way as the blur node accumulates samples, the result stored in the blur node and being fed into the ColorRamp, is actually converging correctly to the right result. So you would see a 'white' result correctly.

Jacob Merrill commented

2021-06-27 21:24:05 +02:00

Added subscriber: @you.le

Jacob Merrill commented

2021-06-27 21:24:05 +02:00

in upbge - @you.le added the ability to sample more than 1 sample inside a single frame, this is expensive - however yields soft shadows instant - and could also apply to progressive 2d filter iterations (samples per frame)

this combined with this - https://gpuopen.com/fidelityfx-superresolution/ - we should be able to get away with wonderful Realtime in eevee with newer gpu.

after the vulkan port -
"DirectX®12.
DirectX®11.
Vulkan®."

in upbge - @you.le added the ability to sample more than 1 sample inside a single frame, this is expensive - however yields soft shadows instant - and could also apply to progressive 2d filter iterations (samples per frame) this combined with this - https://gpuopen.com/fidelityfx-superresolution/ - we should be able to get away with wonderful Realtime in eevee with newer gpu. after the vulkan port - "DirectX®12. DirectX®11. Vulkan®."

Brecht Van Lommel commented

2021-06-28 11:47:26 +02:00

Removed subscriber: @BrechtCorbeel

Brecht Van Lommel commented

2021-06-28 11:47:26 +02:00

@JacobMerrill-1, brainstorming about realtime compositing is off topic here.

@Grady, solving just one of the problems doesn't really get us anywhere I think.

But regarding your proposed solution for the convergence problem. Imagine a blur -> color ramp -> blur node setup. The color ramp outputs wrong results in initial iterations, and those then become part of the second blur node's progressive results.

Technically speaking this still gives you a consistent estimator (and no longer an unbiased estimator). Practically speaking, I imagine the convergence rate will be poor and memory usage will be high.

@JacobMerrill-1, brainstorming about realtime compositing is off topic here. @Grady, solving just one of the problems doesn't really get us anywhere I think. But regarding your proposed solution for the convergence problem. Imagine a blur -> color ramp -> blur node setup. The color ramp outputs wrong results in initial iterations, and those then become part of the second blur node's progressive results. Technically speaking this still gives you a consistent estimator (and no longer an unbiased estimator). Practically speaking, I imagine the convergence rate will be poor and memory usage will be high.

michael campbell commented

2022-04-22 01:59:20 +02:00

Added subscriber: @3di

michael campbell commented

2022-04-22 01:59:20 +02:00

You can do this by using a scale node set to render size, reduce the render size and increase the backdrop size. This keeps the image the same size but lowers the resolution (making anything upstream of the scale node calculate much faster).

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Design Doc: Sample based compositor #53790