This is an initial pass on formalizing proposal how to improve performance and user experience of Compositor.
There are three aspects which are aimed to be addressed:
- Ease of use
- Memory consumption
Ease of use
Currently there are many settings which needs to be set and tweak to have best performance: tile size, OpenCL, buffer use and so on. Settings also have some implicit dependencies under the hood: for example, OpenCL needs big tile size, but big tile size might make some nodes measurably slower, and bigger tile size "breaks" the initial intention of compositor design to show tiles appearing as quick as possible.
For OpenCL case it's also not clear when it's actually being engaged: it will just fail silently, falling back to CPU, giving false impression of having GPU accelerated compute.
Performance of compositor is not up to date. Partially due to its scheduler design, partially due to technical implementation which is per-pixel virtual call (which ruins all sort of coherency).
This is something what is absolutely out of artists control: some operations require memory buffers before/after the node and those are being created automatically. This makes it hard to predict how much memory node setup requires, and how much extra memory is needed when increasing final resolution.
Proposed end goal is: deliver final image as fast as possible.
This is different from being tile-based, where goal was to have first tiles to appear as quick as possible, with giving gradual updates. The downside of this is that overall frame time is higher than giving an entire frame at once. Additionally, tile-based nature complicates task scheduler a lot, and makes it required to keep track of more memory at a time.
It should be possible to transform current design to proposed one in incremental steps:
- [Temporarily] Remove code which unnecessarily complicates scheduler and memory manager which is GPU support.
- Convert all operations to operate in a relative space rather than pixel space (basically, make it possible to change final render resolution without changing compositor network setup).
- Make compositor to operate on final resolution which closely matches resolution of the current "viewer": there is no need to do full 8K compositing when final result is viewed as a tiny backdrop on Full HD monitor.
- Modify operations to operate on an entire frame (or on a given area).
- Modify scheduler to do bottom-to-top scheduling, operating on the entire image.
- Modify memory manager to allocate buffers once is needed and discard them as soon as possible.
- Vectorize (SIMD) all operations where possible.
Look into GPU support with the following requirements:
- Minimize memory throughput, which implies the following point.
- Have all operations implemented on GPU, which again implies following point.
- Share implementation between CPU and GPU as much as possible.
The steps can be gradual and formulated well-enough to happens as code quality days in T73586.