Page MenuHome

Design Doc: Sample based compositor
Open, NormalPublic

Description

This is a design doc for the compositor.

Principles
Sample based: When using samples. With number of samples the artist can switch between speed and quality at any given moment. When speed is needed the artist can lower the number of samples and fast feedback. When actually rendering the number of samples can be increased for better result.
Relative: Currently changing resolution (or percentage) will effect the working of several nodes even default settings of the Blur node needs to be adjusted. With Relative all parameters should be aware of the resolution it is calculating in.
PixelSize aware: Currently the compositor is fixed to the perspective and ortograhic camera models. When using the Panaroma camera’s the blurs are not accurate. In cases like dome rendering, VR/AR a lot of trickery and working arounds are needed in order to composite correctly. When the Compositor takes the actual camera data of the scene (or image) into account this the compositor can calculate more accurate in these cases.
Canvas: Being able to put images in the compositor and align/transform it visually.
GPU support: The current compositor does only support GPU for a certain number of nodes. Other nodes are calculated on the CPU; huge amount of data is loaded/unloaded what takes a lot of resources. The new design should be able to run fully on the GPU.

Samples
In the sample based compositor the X,Y coordinate of the output image is transformed to a Ray (Position, Direction, UpVector, samplesize). This ray is evaluated to the node tree. When the ray is at an input node (Image, RenderLayer) the given ray is transformed into the specific image space. The image will then be samples to a result color/value. This result is then passed back to the node who requested it.
Nodes will be able to create alterations to these ray. And select which input socket will receive which ray. For example a blur node can ‘bend’ the incoming ray (of change the samplesize).
As samples are slightly randomized, every sample will sample different part of the same pixel, what will lead to sub pixel sampling. This leads to very crispy images compare to the current compositor.

Current compositor


Sample based compositing

Viewports and filtering
By all buffers, like input/output images, renderlayers nodes the user will be able to select aviewport what identifies where the image is in the scene and with what kind of camera it was created with (eg plane vs spherical). These viewports can be added in the 3d scene. But for canvas compositing we will allow these viewports to be visible in the backdrop of the node editor (feat: canvas compositing).

Using viewports it will be easier to composite planes into your scene when the camera is moving.
Also at all input image/renderlayer nodes the filter (nearest, linear, cubic, smart and others) and clipping (Clip, Extend, Repeat) can be selected that will be used when sampling this image.

Open questions
Normally compositors are image based and many algorithms are also created for image based compositors. As this design is totally different there are also risks like are we able to implement all current features into this new architecture. IMO we will get to 95% of the old features. But some features are very hard to implement (or will come with huge penalties). Should we implement this as a second compositor and let the user decide which compositor to use?

Next steps
Discuss/Approve this design
Find support for this design

Resources
Bitbucket source repository: https://bitbucket.org/atmind/blender-compositor-2016 branch compositor-2016
Nodes that have implementation (not guaranteed 100% same):
Viewer, RGB, Value, Mix, AlphaOver, Image (no OpenEXR), Render layers, Movieclip, Blur (only relative and bokeh), Color Matte, Chroma matte, Math, Value to Color, Value to Vector, Vector to Value, Color to Value, Color to Vector, RGB to BW, Separate RGBA/HSVA/YUVA, Combine RGBA/HSVA/YUVA, Hue Sat, Bright contrast, Gamma, Color balance, Color spill, Set Alpha, Channel matte, Difference Matte, Distance Matte, Luma Matte
Note: still very WIP. eg crash-a-lot it is intended to be a proof of concept.

Currently the rays have no width, hence many samples are needed when blurring. In the actual implementation I want to create Cone shaped rays, what will need less sampling to get to better results. Difficulty is the need for a mask to support elliptical blurs

Technical design doc: https://docs.google.com/document/d/1L-BWs_XbX_BgZ0F0vJC9-fxbM6KWQyrIO6qFEOfPSXA/edit#heading=h.ontenmvswzw8

Details

Differential Revisions
D3003: PoC: Sample based compositor
Type
Design

Related Objects

Event Timeline

So the user can add a viewport "object" to the scene, which is a image buffer "dump":

  1. buffer of a input node (image, movie, etc.)?
  2. buffer of a texture node?
  3. buffer of a output node, i.e. a different compositing output?
  4. buffer of another scene (OpenGL / Eevee / Cycles Render)?

And then move, rotate and scale them in x, y, (z) and maybe use array modifiers?
In the compositor the user sees them through the active camera?

What are the "features very hard to implement"?

Thank you.

So if I understand this correctly, the idea is to handle one sample at a time, using Monte Carlo sampling for operations like blur. The advantage of that is that it simplifies the implementation, as every sample can be handled individually, in a single kernel execution. As a result it may also be easier to optimize than a more complicated implementation that theoretically could be faster but isn't. It also naturally provides a progressive preview. "Monte Carlo compositing" is a very interesting idea and I hadn't realised this was the plan.

However, I have major doubts about this approach in practice. Hopefully I'll be proven wrong, but I see these problems:

  • Just like path tracing, there will be slow convergence, noise and fireflies. Monte Carlo sampling is notoriously inefficient, especially in high dimensions (every blur-like node adds a dimension like a bounce in path tracing), and mostly it is used in rendering because it is the least bad solution. With importance sampling, adaptive sampling and QMC results can be improved, but this is expensive as well, and in general does not save you from the slow 1/sqrt(N) convergence.
  • With the graph flattened into a tree, nodes with outputs linked to multiple nodes will be computed multiple times. How much it slows things down depends on the specific graph, I don't have a good estimate how that would work out in production setups.
  • Another extra cost is that noise-free operations will now be repeated for each sample, whereas otherwise they could be executed only once.
  • Memory access is usually the number one bottleneck on modern CPUs and GPUs. If you add a blur node, any nodes feeding into that will now have incoherent memory access to random parts of the buffer. This will lead to cache misses on the CPU and non-coalesced memory access on the GPU. Processing all pixels in one node at a time isn't ideal either, with the entire image going in and out of the caches every time. And if the blur radius isn't too crazy high maybe the cache misses aren't too bad, but still I expect it to be far from optimal.
  • This is more a detail about the current implementation, but the way it works on the CPU seems to involve a lot of branching, in a way that I would expect to be about as expensive as the virtual function calls that we currently have. This branching would be avoided when working on blocks of pixels at a time, or dynamically generating OpenCL kernel code, both of which would also allow more SIMD optimization.
  • As you mention, some operations are impossible to implement in this framework. I'm guessing Fast Gaussian blur, Normalize, Inpaint, Vector Blur, Denoising, .. ?
  • And in a way I think this is the biggest problem: operations like mix RGB, RGB curves, color ramps will not give the same result, because they work on individual noisy samples instead of the final result. For example if you blur a black and white checkerboard, the individual samples will always be 0 and 1, and never any grayscale values in between, so applying a color ramp to that will not give good results. For unbiased Monte Carlo sampling in principle you can only do linear combinations, non-linear operations on the individual samples just can't work correctly.

Here's a .blend to show demonstrate two problems. Note I haven't tried compiling the patch yet and testing this.

  • The ground has a checkerboard, which goes through a blur node followed by a RGB curves operation, which should give a different result than the current compositor due to bias.
  • Depth of field with sharp specular highlights in Monte Carlo path tracing is notoriously noisy, which is why it is often done in compositing instead. Now the compositor might have a similar problem, although it's not really the same because each individual sample is massively cheaper to compute since it doesn't involve any raytracing. The .blend contains an image with specular highlights and blur nodes, which I would expect to require many samples.
Brecht Van Lommel (brecht) triaged this task as Normal priority.Jan 16 2018, 4:46 AM

Hi Brecht!

thanks for your feedback. We will align our plans with your comments and check what will be the best strategic and next steps are for the compositor.

I think you missed an important point regarding the editing performance:

As documented here (section "Keep Buffers", https://wiki.blender.org/index.php/Dev:Ref/Proposals/Compositor),
the main reason why the current compositor is so slow while editing nodes, is because it generates the full node tree after each edit (e.g. even if only the last node before the viewer has been modified).

This proposol does not solve that more general problem. The sample-based progressive mode will lead to faster feedback, but to me it seems like this will scale poorly. Since the whole tree is regenerated each time, when the node tree becomes more complex, less samples can be calculated to maintain the same level of feedback. And less samples means lower quality of the displayed result. In consequence the user then may have to wait a short moment until assessing the result becomes possible. And that would kind of ruin the idea of fast feedback.

As already described in the wiki proposal, a possible solution is to introduce smart caching of intermediate results (like the result of a whole tree branch or all inputs of the node currently being edited), as well as introducing the possibilty to work in a lower resolution. Both would make sense for a pure GPU compositing engine, as well, I guess. While fast feedback is important, it is also important that the final result is available as fast as possible.

With recent advances in Monte-Carlo ray tracing (and specifically the VCM type of renderers), I suggest to use a bi-directional path tracer (with? MIS) instead of the currently proposed forward path tracer. Or at the very least, implement a reverse path tracer where rays start at the source images (the emitter) where important samples (highlights) are easily identified before tracing commences (also enabling Metropolis tracing). This is all the more efficient since users will typically blur the input a lot more than they will want to sharpen the output.

Furthermore, in contrast with Monte-Carlo in 3D rendering, compositing can have deterministic traversal of all paths (unfortunately preventing the beautiful crispy antialiasing). Shuffling (hashing) all possible bends and then progressively processing them will appear pseudo-random, but converges to the ground truth after at most #pixels^M paths with M being the length of the longest chain through the node-tree. No additional work is done compared to the traditional compositor. Also, a node that performs one of the impossible operations should trigger a ‘full render’ up until that node (which has deterministic time-complexity).

Yet another advantage is that when node N (having multiple connected outputs) has completed some reverse paths, with N being the length of the longest chain through the node-tree up until that node (N <= M), the intermediate result can already be buffered.

  • This also works for the forward paths at node K from the camera (the sensor).
  • When a user interrupts the compositor, it can resume from those buffers.
  • When a user modifies a node somewhere in the middle, the buffers at node N and K are still valid and compositing can continue thereafter (clearing only the buffers between nodes N and K and the ones between each non-linear node and N and K).
  • Buffering is not needed for nodes that have a single connected output.

Level-of-detail:

  1. Mipmapping the buffers (a strategy akin to the aforementioned cone-tracing) could yield a significant performance gain for blur-like nodes with high radius in return for some loss-of-precision; a node-tree-analyzer could decide to insert additional buffers at the inputs to those blur-like nodes. Remember that mipmap addressing reduces to bit-shifting the address when the pixels are ordered along a Hilbert or Lebesgue space-filling curve.
  2. One can favor SIMD and GPU-like processors by working in (say) 4x4 pixel-groups for more coherent memory access; again the space-filling curves helps with addressing here. Applying pixel-grouping up until but not including the output-image prevents blockiness in the viewport. In a further micro-optimization, we only compute a single (average) bend per pixel-group which translates as mipmapping on the ray bending,
  3. MIS might help to balance between methods 1 and 2.

@Ivo van der Lans (Ivo) please note this proposal is about the compositor, not cycles.