Page MenuHome

Design Doc: Sample based compositor
Open, NormalPublic


This is a design doc for the compositor.

Sample based: When using samples. With number of samples the artist can switch between speed and quality at any given moment. When speed is needed the artist can lower the number of samples and fast feedback. When actually rendering the number of samples can be increased for better result.
Relative: Currently changing resolution (or percentage) will effect the working of several nodes even default settings of the Blur node needs to be adjusted. With Relative all parameters should be aware of the resolution it is calculating in.
PixelSize aware: Currently the compositor is fixed to the perspective and ortograhic camera models. When using the Panaroma camera’s the blurs are not accurate. In cases like dome rendering, VR/AR a lot of trickery and working arounds are needed in order to composite correctly. When the Compositor takes the actual camera data of the scene (or image) into account this the compositor can calculate more accurate in these cases.
Canvas: Being able to put images in the compositor and align/transform it visually.
GPU support: The current compositor does only support GPU for a certain number of nodes. Other nodes are calculated on the CPU; huge amount of data is loaded/unloaded what takes a lot of resources. The new design should be able to run fully on the GPU.

In the sample based compositor the X,Y coordinate of the output image is transformed to a Ray (Position, Direction, UpVector, samplesize). This ray is evaluated to the node tree. When the ray is at an input node (Image, RenderLayer) the given ray is transformed into the specific image space. The image will then be samples to a result color/value. This result is then passed back to the node who requested it.
Nodes will be able to create alterations to these ray. And select which input socket will receive which ray. For example a blur node can ‘bend’ the incoming ray (of change the samplesize).
As samples are slightly randomized, every sample will sample different part of the same pixel, what will lead to sub pixel sampling. This leads to very crispy images compare to the current compositor.

Current compositor

Sample based compositing

Viewports and filtering
By all buffers, like input/output images, renderlayers nodes the user will be able to select aviewport what identifies where the image is in the scene and with what kind of camera it was created with (eg plane vs spherical). These viewports can be added in the 3d scene. But for canvas compositing we will allow these viewports to be visible in the backdrop of the node editor (feat: canvas compositing).

Using viewports it will be easier to composite planes into your scene when the camera is moving.
Also at all input image/renderlayer nodes the filter (nearest, linear, cubic, smart and others) and clipping (Clip, Extend, Repeat) can be selected that will be used when sampling this image.

Open questions
Normally compositors are image based and many algorithms are also created for image based compositors. As this design is totally different there are also risks like are we able to implement all current features into this new architecture. IMO we will get to 95% of the old features. But some features are very hard to implement (or will come with huge penalties). Should we implement this as a second compositor and let the user decide which compositor to use?

Next steps
Discuss/Approve this design
Find support for this design

Bitbucket source repository: branch compositor-2016
Nodes that have implementation (not guaranteed 100% same):
Viewer, RGB, Value, Mix, AlphaOver, Image (no OpenEXR), Render layers, Movieclip, Blur (only relative and bokeh), Color Matte, Chroma matte, Math, Value to Color, Value to Vector, Vector to Value, Color to Value, Color to Vector, RGB to BW, Separate RGBA/HSVA/YUVA, Combine RGBA/HSVA/YUVA, Hue Sat, Bright contrast, Gamma, Color balance, Color spill, Set Alpha, Channel matte, Difference Matte, Distance Matte, Luma Matte
Note: still very WIP. eg crash-a-lot it is intended to be a proof of concept.

Currently the rays have no width, hence many samples are needed when blurring. In the actual implementation I want to create Cone shaped rays, what will need less sampling to get to better results. Difficulty is the need for a mask to support elliptical blurs

Technical design doc:


Differential Revisions
D3003: PoC: Sample based compositor

Related Objects

Event Timeline

So the user can add a viewport "object" to the scene, which is a image buffer "dump":

  1. buffer of a input node (image, movie, etc.)?
  2. buffer of a texture node?
  3. buffer of a output node, i.e. a different compositing output?
  4. buffer of another scene (OpenGL / Eevee / Cycles Render)?

And then move, rotate and scale them in x, y, (z) and maybe use array modifiers?
In the compositor the user sees them through the active camera?

What are the "features very hard to implement"?

Thank you.

So if I understand this correctly, the idea is to handle one sample at a time, using Monte Carlo sampling for operations like blur. The advantage of that is that it simplifies the implementation, as every sample can be handled individually, in a single kernel execution. As a result it may also be easier to optimize than a more complicated implementation that theoretically could be faster but isn't. It also naturally provides a progressive preview. "Monte Carlo compositing" is a very interesting idea and I hadn't realised this was the plan.

However, I have major doubts about this approach in practice. Hopefully I'll be proven wrong, but I see these problems:

  • Just like path tracing, there will be slow convergence, noise and fireflies. Monte Carlo sampling is notoriously inefficient, especially in high dimensions (every blur-like node adds a dimension like a bounce in path tracing), and mostly it is used in rendering because it is the least bad solution. With importance sampling, adaptive sampling and QMC results can be improved, but this is expensive as well, and in general does not save you from the slow 1/sqrt(N) convergence.
  • With the graph flattened into a tree, nodes with outputs linked to multiple nodes will be computed multiple times. How much it slows things down depends on the specific graph, I don't have a good estimate how that would work out in production setups.
  • Another extra cost is that noise-free operations will now be repeated for each sample, whereas otherwise they could be executed only once.
  • Memory access is usually the number one bottleneck on modern CPUs and GPUs. If you add a blur node, any nodes feeding into that will now have incoherent memory access to random parts of the buffer. This will lead to cache misses on the CPU and non-coalesced memory access on the GPU. Processing all pixels in one node at a time isn't ideal either, with the entire image going in and out of the caches every time. And if the blur radius isn't too crazy high maybe the cache misses aren't too bad, but still I expect it to be far from optimal.
  • This is more a detail about the current implementation, but the way it works on the CPU seems to involve a lot of branching, in a way that I would expect to be about as expensive as the virtual function calls that we currently have. This branching would be avoided when working on blocks of pixels at a time, or dynamically generating OpenCL kernel code, both of which would also allow more SIMD optimization.
  • As you mention, some operations are impossible to implement in this framework. I'm guessing Fast Gaussian blur, Normalize, Inpaint, Vector Blur, Denoising, .. ?
  • And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly.

Here's a .blend to show demonstrate two problems. Note I haven't tried compiling the patch yet and testing this.

  • The ground has a checkerboard, which goes through a blur node followed by a RGB curves operation, which should give a different result than the current compositor due to bias.
  • Depth of field with sharp specular highlights in Monte Carlo path tracing is notoriously noisy, which is why it is often done in compositing instead. Now the compositor might have a similar problem, although it's not really the same because each individual sample is massively cheaper to compute since it doesn't involve any raytracing. The .blend contains an image with specular highlights and blur nodes, which I would expect to require many samples.
Brecht Van Lommel (brecht) triaged this task as Normal priority.Jan 16 2018, 4:46 AM

Hi Brecht!

thanks for your feedback. We will align our plans with your comments and check what will be the best strategic and next steps are for the compositor.

I think you missed an important point regarding the editing performance:

As documented here (section "Keep Buffers",,
the main reason why the current compositor is so slow while editing nodes, is because it generates the full node tree after each edit (e.g. even if only the last node before the viewer has been modified).

This proposol does not solve that more general problem. The sample-based progressive mode will lead to faster feedback, but to me it seems like this will scale poorly. Since the whole tree is regenerated each time, when the node tree becomes more complex, less samples can be calculated to maintain the same level of feedback. And less samples means lower quality of the displayed result. In consequence the user then may have to wait a short moment until assessing the result becomes possible. And that would kind of ruin the idea of fast feedback.

As already described in the wiki proposal, a possible solution is to introduce smart caching of intermediate results (like the result of a whole tree branch or all inputs of the node currently being edited), as well as introducing the possibilty to work in a lower resolution. Both would make sense for a pure GPU compositing engine, as well, I guess. While fast feedback is important, it is also important that the final result is available as fast as possible.