Page MenuHome

VSE 2.0: Performance, Cache System
Confirmed, NormalPublicDESIGN

Description

Video Sequencer Cache

NOTE: This is the first pass of the design. It will be worked a bit more after discussion within the module, and presentation and diagrams will become more clear.

This document describes caching system which design and implementation for the VSE 2.0 project (T78986).
There is some intersection of performance topics listed in the VSE 2.0: Performance (T78992).

User level design

On user level cache system should follow zero configuration principle: the video playback and tweaking should be as fast as possible without user spending time on fine-tuning per-project settings.

The only settings which user should be interacting with are in the User Preferences:

  • In-memory cache size limit
  • Disk cache size limit
  • Disk cache folder

These settings are set once in the User Preferences and are used by all sequencer projects. The default values are based on the minimal hardware requirements.

Code design

Levels of caching

For best editing and playback performance multiple levels of cache are needed.
The most important ones are:

  • Cache of strip input (colormanaged using general Blender color space rules)

    This allows to faster, without lag, move image strip in time, adjust strip settings like transform, by avoiding need to re-read file on every modification. Lets call this cache level STRIP_INPUT.
  • Cache of the final sequencer result.

    This allows to have realtime playback after the sequencer got cached. Lets call this cache level SEQUENCER_FINAL.

The simplified flow of the image from disk to the artist is presented in the following diagram:


NOTE

Need to think about whether having strip output cache is helpful. If the stack rendering is fast, having extra levels of cache will have negative affect due to less final frames fitting into the memory.


Cache referencing

Cache levels are to utilize reference counting as much as possible. For example, when having single Image Strip without modifications set up in the strip the final sequencer frame in the SEQUENCER_FINAL cache is to reference the image from STRIP_INPUT cache. This allows to minimize memory footprint, playback performance for the story boarding type of tasks performed in the sequencer.

The following example visualizes cache frame referencing in the following scenario:

  • Sequencer have single Image Strip using HappyFroggo.png as an input. The strip has length of 4.

In Blender terms, the cache contains a single copy of ImBuf created for HappyForggo.png. All the sequencer cache entries are referencing this ImBuf for lowest possible memory footprint.

Cache resolution

In a typical video editing scenario an artist views the sequencer result in a rather small area of the screen layout:

This behavior can be exploited in the following way: the sequencer processing and caching can happen in the lower resolution. This is something what current proxies design is solving, but does in the fully manual manner.

There is a possibility to make proxies behavior more automatic, by performing downscale on an image after it was read from disk, but before it gets put to the STRIP_INPUT cache. Default behavior could be something like:

  • Use closest power-of-two downscaling (similar to mipmaps)
  • The target resolution is 50% of the window resolution, but no more than 1080p.

In order to support workflows where an artist needs to investigate in a close-up manner the final result, there will be a display option Best Quality (but defaulting to Best Performance). This could fit into existing Proxy Display Size menu.

In the future more automated input resolution selection is possible to be implemented. For example, it is possible to automatically switch to the
Best Quality mode when zoom-in is detected.

Image scaling with a power-of-two scale factor can be implemented very efficiently using threading and vectorization.

On a performance aspect, for image sequences such scale down will be an extra computation, which will only pay off if effects/transformation is used.

For the movie files, this step will actually make things faster because it is required to convert color spaces (happening in sws_scale), which is not
threadable. The scale down will be done together with color space conversion, which is expected to give better performance compared to the current state of the sequencer playback.

Event Timeline

Sergey Sharybin (sergey) changed the task status from Needs Triage to Confirmed.Mon, Aug 31, 2:47 PM
Sergey Sharybin (sergey) triaged this task as High priority.
Sergey Sharybin (sergey) lowered the priority of this task from High to Normal.
Sergey Sharybin (sergey) created this task.
Sergey Sharybin (sergey) moved this task from Backlog to Blender 2.91 on the VFX & Video board.

User level design

I agree. Though tools like prefetching and baking I think should exist and be controlled by user. I guess that is out of scope here anyway.

Code design - Levels of caching

First of all we must have clear definition what STRIP_INPUT is. I understand it that STRIP_INPUT is image as read, possibly prescaled.

For best editing and playback performance multiple levels of cache are needed.

This highly depends on rendering implementation. For example now we cache only final images because if you use a lot of effects processing is very slow. You can use cache to store rendered images and you can store maximum images when you store only final images. Downside is that cache is invalidated with any modification.

Storing raw or STRIP_INPUT doesn't really make too much sense in current state if you can save 10s or less of original footage with not so crazy timeline. It may save you few seconds of rendering, but on the other hand you can't store as much rendered frames in RAM.

Need to think about whether having strip output cache is helpful. If the stack rendering is fast, having extra levels of cache will have negative affect due to less final frames fitting into the memory.

This is actually important thing to consider. If we could significantly improve processing speed, We could better focus on "optimizing" IO operations with cache.
Personally I would rather work on processing performance and then have only STRIP_INPUT type cache. In some cases like processing in GPU you can't really have other types.

Code design - Cache referencing

Cache levels are to utilize reference counting as much as possible. For example, when having single Image Strip without modifications set up in the strip the final sequencer frame in the SEQUENCER_FINAL cache is to reference the image from STRIP_INPUT cache. This allows to minimize memory footprint, playback performance for the story boarding type of tasks performed in the sequencer.

This already happens in current cache design.

The following example visualizes cache frame referencing in the following scenario:

  • Sequencer have single Image Strip using HappyFroggo.png as an input. The strip has length of 4.

In Blender terms, the cache contains a single copy of ImBuf created for HappyForggo.png. All the sequencer cache entries are referencing this ImBuf for lowest possible memory footprint.

This change needs to be done partially in rendering code - lookup file and frame we are reading.
Change in cache would have to be to hash images against input file instead of strip. I am not sure if this would require own design unless we just use filepath, which would be probably sufficient.

Cache resolution

If I understand it correctly, with Best Performance, if we have media with resolution that doesn't match any fraction of project resolution, we prescale fast to closest resolution and then we use this image as if it was original?

Should we then drop 75% preview fraction as it is not close to power of 2 fraction? Or keep it for case where we are willing to build proxies at that size?

These changes are definitely possible, but they have little to do with cache? Regardless of that, this is not bad idea. I guess this would also require movie rendering to be handled in a bit different way to other images (passing desired resolution as argument at least)

The only settings which user should be interacting with are in the User Preferences:

Are you proposing to remove the Cache Settings panel entirely? Or is the plan only to change the default settings and behavior so that there is less need to tweak these settings?

Currently there is "Recycle Up To Cost" to prioritize keeping entries that took a long time to compute (e.g. scene strips) in the cache. That's not mentioned in this design doc, so I'm not sure if you intend to keep it?

Cache of strip input (colormanaged using general Blender color space rules)
Cache of the final sequencer result.

If I understand correctly, this corresponds to current Raw and Final caches.

Need to think about whether having strip output cache is helpful. If the stack rendering is fast, having extra levels of cache will have negative affect due to less final frames fitting into the memory.

I believe the current system always caches all strip outputs at the current frame. That seems like a good thing to keep at least.

@Richard Antalik (ISS),

Though tools like prefetching and baking I think should exist and be controlled by user. I guess that is out of scope here anyway.

Implementation is out of this design scope indeed. But the mind set should be the same: zero configuration, best editor experience.

Not sure what you mean by baking here. You shouldn't bake anything to be able to edit videos.

Cache levels are to utilize reference counting as much as possible. For example, when having single Image Strip without modifications set up in the strip the final sequencer frame in the SEQUENCER_FINAL cache is to reference the image from STRIP_INPUT cache. This allows to minimize memory footprint, playback performance for the story boarding type of tasks performed in the sequencer.

This already happens in current cache design.

Does it happen in the design or implementation? With the attached file I expect the heavy operation to be only performed once, and the followup scrubbing and frame navigation should be realtime, and the memory footprint is to stay constant.

This is not the behavior I'm getting with this file.

Storing raw or STRIP_INPUT doesn't really make too much sense in current state if you can save 10s or less of original footage with not so crazy timeline. It may save you few seconds of rendering, but on the other hand you can't store as much rendered frames in RAM.

Keep in mind, in video editor you don't only playback or render, but also perform correction operations. Those must not be clogging the interface communication.
If the input data is not cached, I don't see how you can keep responsive interface while tweaking settings.

This change needs to be done partially in rendering code - lookup file and frame we are reading.
Change in cache would have to be to hash images against input file instead of strip. I am not sure if this would require own design unless we just use filepath, which would be probably sufficient.

To me this is an implementation detail.
Cache does not exist on its own, strip render does not exist on its own. They work together. Also this task is not about "changes are only done in seqcache.c and nowhere else".

If I understand it correctly, with Best Performance, if we have media with resolution that doesn't match any fraction of project resolution, we prescale fast to closest resolution and then we use this image as if it was original?

The behavior is similar to mipmaps.

@Brecht Van Lommel (brecht),

Are you proposing to remove the Cache Settings panel entirely? Or is the plan only to change the default settings and behavior so that there is less need to tweak these settings?

Remove entirely.

I do not see any editor to go and really fine-tune settings for a specific project. You would need to come with really strong argument to move me from the option extermination mode ;)

Currently there is "Recycle Up To Cost" to prioritize keeping entries that took a long time to compute (e.g. scene strips) in the cache. That's not mentioned in this design doc, so I'm not sure if you intend to keep it?

This option slipped through my radars. I find this counter-intuitive option, which I don't know why it exists. Remove it.

If I understand correctly, this corresponds to current Raw and Final caches.

Indeed.
But currently we have quite too much of cache levels. And their interaction seems to be broken.

I believe the current system always caches all strip outputs at the current frame. That seems like a good thing to keep at least.

Is there a design doc explaining exact behavior of the current system?
When this cache of all outputs happens" During playback? After playback has stopped? Is it limited to outputs or also does inputs?


From reading this two replies, seems like all the required building blocks are either implemented or were intended in the original cache design. Meaning, this step in the project should be simple and straight-forward, right? :)

I believe the current system always caches all strip outputs at the current frame. That seems like a good thing to keep at least.

Is there a design doc explaining exact behavior of the current system?
When this cache of all outputs happens" During playback? After playback has stopped? Is it limited to outputs or also does inputs?

There are some notes about this in seqcache.c:

 * All images created during rendering are added to cache, even if the cache is already full.
 * This is because:
 *  - one image may be needed multiple times during rendering.
 *  - keeping the last rendered frame allows us for faster re-render when user edits strip in stack
 *  - we can decide if we keep frame only when it's completely rendered. Otherwise we risk having
 *    "holes" in the cache, which can be annoying
 * If the cache is full all entries for pending frame will have is_temp_cache set.

...

  bool is_temp_cache; /* this cache entry will be freed before rendering next frame */

I believe it's for all cache levels. And freeing happens right before another frame is rendered, it's not related to playback specifically.

Is it correct that putting images to the non-final-cache only happens on sequencer "re-render" (when user changes strip property)?

I think at this point it's better to let @Richard Antalik (ISS) to have a look into the file I've attached before. If it will be possible to move the strip under the playhead without any latency and lags, and have a single image stored in the cache that would address a lot of points from this design.

Is it correct that putting images to the non-final-cache only happens on sequencer "re-render" (when user changes strip property)?

I don't think so, there is no distinction between render and re-render here as far as I know.

I think at this point it's better to let @Richard Antalik (ISS) to have a look into the file I've attached before. If it will be possible to move the strip under the playhead without any latency and lags, and have a single image stored in the cache that would address a lot of points from this design.

The cache key is the sequence strip + scene frame + cache type. To make single images and dragging strips work that scene frame would need to be replaced with a local frame within the strip, and that will probably require a bunch of changes to invalidation, clearing and drawing of the cache.

Are you proposing to remove the Cache Settings panel entirely? Or is the plan only to change the default settings and behavior so that there is less need to tweak these settings?

Remove entirely. I do not see any editor to go and really fine-tune settings for a specific project. You would need to come with really strong argument to move me from the option extermination mode ;)

One thing I would argue for, is the ability to disallow proxy data form being cached. As they're already optimised for fast decode they really shouldn't need to be cached (especially when primarily doing temporal editing).
But my concern is that they will over use resources that would benefit other types of strips (strips that: can't be proxied or those generated in Blender).

Cache levels are to utilize reference counting as much as possible. For example, when having single Image Strip without modifications set up in the strip the final sequencer frame in the SEQUENCER_FINAL cache is to reference the image from STRIP_INPUT cache. This allows to minimize memory footprint, playback performance for the story boarding type of tasks performed in the sequencer.

This already happens in current cache design.

Does it happen in the design or implementation? With the attached file I expect the heavy operation to be only performed once, and the followup scrubbing and frame navigation should be realtime, and the memory footprint is to stay constant.

This is not the behavior I'm getting with this file.

No, because image is not without modification. It is scaled to preview resolution. So This would happen, if we cache images after scaling to preview resolution.
That's why I was asking for more concrete definition of STRIP_INPUT, but I guess that would be implementation detail as long as it would "just work"? By design this can be last operation that user have no control of. So "mipmap" in case of this design.

Storing raw or STRIP_INPUT doesn't really make too much sense in current state if you can save 10s or less of original footage with not so crazy timeline. It may save you few seconds of rendering, but on the other hand you can't store as much rendered frames in RAM.

Keep in mind, in video editor you don't only playback or render, but also perform correction operations. Those must not be clogging the interface communication.
If the input data is not cached, I don't see how you can keep responsive interface while tweaking settings.

As Brecht explained, currently we use "temp cache" for 1 currently displayed frame where we store each possible cache level for fastest possible tweaking. The way it works is, that image you change and above are invalidated. Everything that is needed to render image you are changing is cached. When you change frame, images are discarded and cache is filled with new images.

This is currently hacked in into overall cache, because it is convinient, but it could be own cache or it could be not used at all. It benefits you when you change images close to final output, but also when you build up stack of effects.

See this example - playback speed is quite bad, but moving whole image with last transform strip is swift.

From reading this two replies, seems like all the required building blocks are either implemented or were intended in the original cache design. Meaning, this step in the project should be simple and straight-forward, right? :)

I would say yes, but I have some doubts about that power of 2 prescaling. I would probably need to see it in action and then evaluate results.
I always edit videos with uniform resolution so I am not best case for testing this. I will have to do artificial tests.


One thing I would argue for, is the ability to disallow proxy data form being cached. As they're already optimised for fast decode they really shouldn't need to be cached (especially when primarily doing temporal editing).
But my concern is that they will over use resources that would benefit other types of strips (strips that: can't be proxied or those generated in Blender).

This was done as part of bugfix (rB537292498324) and I am not sure if it should be changed. Also off-topic a bit :)

One thing I would argue for, is the ability to disallow proxy data form being cached. As they're already optimised for fast decode they really shouldn't need to be cached (especially when primarily doing temporal editing).
But my concern is that they will over use resources that would benefit other types of strips (strips that: can't be proxied or those generated in Blender).

This was done as part of bugfix (rB537292498324) and I am not sure if it should be changed. Also off-topic a bit :)

I see that it doesn't cache for Raw but is (strip) Cache pre-prepossessed images & Cache Final Image not supposed to cache proxy data too? Because it does. (you can see with the sample file from T80060)

All I'm saying is there should be the ability (either hard coded or preference) to not have proxies pointlessly cached, not sure how that's off topic for a task about the cache system...

The cache key is the sequence strip + scene frame + cache type. To make single images and dragging strips work that scene frame would need to be replaced with a local frame within the strip, and that will probably require a bunch of changes to invalidation, clearing and drawing of the cache.

I see. Indeed the cache key in my proposal would behave the way you've described it.
Is this something we agree that is reasonable to do?

That's why I was asking for more concrete definition of STRIP_INPUT, but I guess that would be implementation detail as long as it would "just work"? By design this can be last operation that user have no control of. So "mipmap" in case of this design.

I think what you've described here is a good definition. At least, I can not currently think of a case when this definition "breaks".

As Brecht explained, currently we use "temp cache" for 1 currently displayed frame where we store each possible cache level for fastest possible tweaking.

This is great.

What I'm trying to understand is what exactly happens in the following scenario:

  • You tweeak strip settings
  • You played few frames forward (so that all "intermediate" cache is invalidated)
  • You tweak setting again

At which point the input needed for the tweak will be loaded?
Not sure if it matters too much for this specific design task, but I kind of want to understand existing behavior better :)

I would say yes, but I have some doubts about that power of 2 prescaling.

The power of two is for the scaling performance.
It doesn't need any trickery for average weights, can be vecoriszed and so things like this.

The cache key is the sequence strip + scene frame + cache type. To make single images and dragging strips work that scene frame would need to be replaced with a local frame within the strip, and that will probably require a bunch of changes to invalidation, clearing and drawing of the cache.

I see. Indeed the cache key in my proposal would behave the way you've described it.
Is this something we agree that is reasonable to do?

Yes. I would add, that in case of static image strips (image, color, text), the local frame should always point to frame 1 for STRIP_INPUT type.
I think we already translate cfra to local frame anyway in seq_cache_cfra_to_frame_index.

That's why I was asking for more concrete definition of STRIP_INPUT, but I guess that would be implementation detail as long as it would "just work"? By design this can be last operation that user have no control of. So "mipmap" in case of this design.

I think what you've described here is a good definition. At least, I can not currently think of a case when this definition "breaks".

As Brecht explained, currently we use "temp cache" for 1 currently displayed frame where we store each possible cache level for fastest possible tweaking.

This is great.

What I'm trying to understand is what exactly happens in the following scenario:

  • You tweeak strip settings
  • You played few frames forward (so that all "intermediate" cache is invalidated)
  • You tweak setting again

At which point the input needed for the tweak will be loaded?
Not sure if it matters too much for this specific design task, but I kind of want to understand existing behavior better :)

During rendering BKE_sequencer_give_ibuf:

  1. "intermediate" cache is freed (any frame that is not current). BKE_sequencer_cache_free_temp_cache
  2. Each image from every possible stage is put into cache. All will be stored (some imges temporary, some for long term).

When strip is changed:

  1. Final image and changed strip output is invalidated so we need to render strip stack to display changed image. seq_render_strip_stack
  2. Strip stack is traversed backwards, if image is not in cache, it is rendered (at this point most images are still there)
  3. We start rendering in "forward direction" starting, where first image is missing (changed strip).

That is pretty much whole cycle.

This feature requires image to be cached at all used states, that's why we also have SEQ_CACHE_STORE_PREPROCESSED and SEQ_CACHE_STORE_PREPROCESSED as stack works with composited images and effects expects preprocessed image that haven't been composited

I would say yes, but I have some doubts about that power of 2 prescaling.

The power of two is for the scaling performance.
It doesn't need any trickery for average weights, can be vecoriszed and so things like this.

Yes, I just can imagine situations, where it would be very useful and also situations where this could be detremental. And question is rather how to distinguish these situations, and where to draw line(in runtime) between do prescaling and don't, so it works fairly well.

But I definitely want to look at this method.

Could it be considered to implement caching for the playback functions needed for navigating a specific frame by using shortcut keys? Normally(industry-standard) when working with video you would use J(reverse), K(stop), L(forward) to increase and decrease(more key presses for more/less) playback speed and direction, meaning that prefetch caching should also cache reverse, and in steps for fast forward/reverse.

When the prefetch cache reaches the end of the range, it should automatically continue at the beginning of the range.

The Prefetch Cache is quite resource hungry, and most VSE operations will slow down while it's on(ex. "dragging" values), so a more strict pause-prefetch-caching regime could be implemented.

Also a way to cache for preview using blade-tool and trim-tools could be thought into the cache system.

@Sergey Sharybin (sergey) my plan of action would be:

  1. Implement fast prescaling for image strips and movie strips
  2. Make sure, that static image strips reference only one frame. This won't apply to generator effects(color, text) until we have animation cache for them. Fcurve lookup is slow.
  3. Keep current cache levels for current "temp cache" optimization, but use only STRIP_INPUT and/or SEQUENCER_FINAL
    1. I will try to find good solution where we can check if it makes sense to use SEQUENCER_FINAL type to conserve memory usage. STRIP_INPUT would be prioritized
  4. Remove cache settings panels, and remove "Use Disk Cache" checkbox and compression enum.
    1. I would like to still have control over caching from panel, because I use it for debugging performance and caching issues though.

This order is chosen to ensure, that more risky changes get in first, so they can be tested properly

@Peter Fog (tintwotin) I would suggest to report this as a bug. It doesn't really fit into this design document.