VSE 2.0: Performance, Cache System #80278
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
11 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#80278
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Video Sequencer Cache
NOTE: This is the first pass of the design. It will be worked a bit more after discussion within the module, and presentation and diagrams will become more clear.
This document describes caching system which design and implementation for the
VSE 2.0 project
(#78986).There is some intersection of performance topics listed in the
VSE 2.0: Performance
(#78992).User level design
On user level cache system should follow zero configuration principle: the video playback and tweaking should be as fast as possible without user spending time on fine-tuning per-project settings.
The only settings which user should be interacting with are in the
User Preferences
:These settings are set once in the User Preferences and are used by all sequencer projects. The default values are based on the minimal [[ https://www.blender.org/download/requirements/ | hardware
requirements ]].
Code design
Levels of caching
For best editing and playback performance multiple levels of cache are needed.
The most important ones are:
The simplified flow of the image from disk to the artist is presented in the following diagram:
NOTE
Need to think about whether having strip output cache is helpful. If the stack rendering is fast, having extra levels of cache will have negative affect due to less final frames fitting into the memory.
Cache referencing
Cache levels are to utilize reference counting as much as possible. For example, when having single Image Strip without modifications set up in the strip the final sequencer frame in the
SEQUENCER_FINAL
cache is to reference the image fromSTRIP_INPUT
cache. This allows to minimize memory footprint, playback performance for the story boarding type of tasks performed in the sequencer.The following example visualizes cache frame referencing in the following scenario:
HappyFroggo.png
as an input. The strip has length of 4.In Blender terms, the cache contains a single copy of
ImBuf
created forHappyForggo.png
. All the sequencer cache entries are referencing thisImBuf
for lowest possible memory footprint.Cache resolution
In a typical video editing scenario an artist views the sequencer result in a rather small area of the screen layout:
This behavior can be exploited in the following way: the sequencer processing and caching can happen in the lower resolution. This is something what current proxies design is solving, but does in the fully manual manner.
There is a possibility to make proxies behavior more automatic, by performing downscale on an image after it was read from disk, but before it gets put to the
STRIP_INPUT
cache. Default behavior could be something like:In order to support workflows where an artist needs to investigate in a close-up manner the final result, there will be a display option
Best Quality
(but defaulting toBest Performance
). This could fit into existingProxy Display Size
menu.In the future more automated input resolution selection is possible to be implemented. For example, it is possible to automatically switch to the
Best Quality
mode when zoom-in is detected.Image scaling with a power-of-two scale factor can be implemented very efficiently using threading and vectorization.
On a performance aspect, for image sequences such scale down will be an extra computation, which will only pay off if effects/transformation is used.
For the movie files, this step will actually make things faster because it is required to convert color spaces (happening in
sws_scale
), which is notthreadable. The scale down will be done together with color space conversion, which is expected to give better performance compared to the current state of the sequencer playback.
Task progress
Cache referencing
d837923a56
c74086376f
User level design
Levels of caching
f448ff2afe
445ebcaa30
Cache resolution
D9414: VSE: Render in size nearest to preview image
Update/followup above patch to work on images
Changed status from 'Needs Triage' to: 'Confirmed'
Added subscribers: @Sergey, @iss, @fsiddi
User level design
I agree. Though tools like prefetching and baking I think should exist and be controlled by user. I guess that is out of scope here anyway.
Code design - Levels of caching
First of all we must have clear definition what
STRIP_INPUT
is. I understand it thatSTRIP_INPUT
is image as read, possibly prescaled.This highly depends on rendering implementation. For example now we cache only final images because if you use a lot of effects processing is very slow. You can use cache to store rendered images and you can store maximum images when you store only final images. Downside is that cache is invalidated with any modification.
Storing raw or
STRIP_INPUT
doesn't really make too much sense in current state if you can save 10s or less of original footage with not so crazy timeline. It may save you few seconds of rendering, but on the other hand you can't store as much rendered frames in RAM.This is actually important thing to consider. If we could significantly improve processing speed, We could better focus on "optimizing" IO operations with cache.
Personally I would rather work on processing performance and then have only
STRIP_INPUT
type cache. In some cases like processing in GPU you can't really have other types.Code design - Cache referencing
This already happens in current cache design.
This change needs to be done partially in rendering code - lookup file and frame we are reading.
Change in cache would have to be to hash images against input file instead of strip. I am not sure if this would require own design unless we just use filepath, which would be probably sufficient.
Cache resolution
If I understand it correctly, with Best Performance, if we have media with resolution that doesn't match any fraction of project resolution, we prescale fast to closest resolution and then we use this image as if it was original?
Should we then drop 75% preview fraction as it is not close to power of 2 fraction? Or keep it for case where we are willing to build proxies at that size?
These changes are definitely possible, but they have little to do with cache? Regardless of that, this is not bad idea. I guess this would also require movie rendering to be handled in a bit different way to other images (passing desired resolution as argument at least)
Added subscriber: @brecht
Are you proposing to remove the Cache Settings panel entirely? Or is the plan only to change the default settings and behavior so that there is less need to tweak these settings?
Currently there is "Recycle Up To Cost" to prioritize keeping entries that took a long time to compute (e.g. scene strips) in the cache. That's not mentioned in this design doc, so I'm not sure if you intend to keep it?
If I understand correctly, this corresponds to current Raw and Final caches.
I believe the current system always caches all strip outputs at the current frame. That seems like a good thing to keep at least.
@iss,
Implementation is out of this design scope indeed. But the mind set should be the same: zero configuration, best editor experience.
Not sure what you mean by baking here. You shouldn't bake anything to be able to edit videos.
Does it happen in the design or implementation? With the attached file I expect the heavy operation to be only performed once, and the followup scrubbing and frame navigation should be realtime, and the memory footprint is to stay constant.
vse_cache_reference.zip
This is not the behavior I'm getting with this file.
Keep in mind, in video editor you don't only playback or render, but also perform correction operations. Those must not be clogging the interface communication.
If the input data is not cached, I don't see how you can keep responsive interface while tweaking settings.
To me this is an implementation detail.
Cache does not exist on its own, strip render does not exist on its own. They work together. Also this task is not about "changes are only done in
seqcache.c
and nowhere else".The behavior is similar to mipmaps.
@brecht,
Remove entirely.
I do not see any editor to go and really fine-tune settings for a specific project. You would need to come with really strong argument to move me from the option extermination mode ;)
This option slipped through my radars. I find this counter-intuitive option, which I don't know why it exists. Remove it.
Indeed.
But currently we have quite too much of cache levels. And their interaction seems to be broken.
Is there a design doc explaining exact behavior of the current system?
When this cache of all outputs happens" During playback? After playback has stopped? Is it limited to outputs or also does inputs?
From reading this two replies, seems like all the required building blocks are either implemented or were intended in the original cache design. Meaning, this step in the project should be simple and straight-forward, right? :)
There are some notes about this in
seqcache.c
:I believe it's for all cache levels. And freeing happens right before another frame is rendered, it's not related to playback specifically.
Is it correct that putting images to the non-final-cache only happens on sequencer "re-render" (when user changes strip property)?
I think at this point it's better to let @iss to have a look into the file I've attached before. If it will be possible to move the strip under the playhead without any latency and lags, and have a single image stored in the cache that would address a lot of points from this design.
I don't think so, there is no distinction between render and re-render here as far as I know.
The cache key is the sequence strip + scene frame + cache type. To make single images and dragging strips work that scene frame would need to be replaced with a local frame within the strip, and that will probably require a bunch of changes to invalidation, clearing and drawing of the cache.
Added subscriber: @ChristopherAnderssarian
One thing I would argue for, is the ability to disallow proxy data form being cached. As they're already optimised for fast decode they really shouldn't need to be cached (especially when primarily doing temporal editing).
But my concern is that they will over use resources that would benefit other types of strips (strips that: can't be proxied or those generated in Blender).
No, because image is not without modification. It is scaled to preview resolution. So This would happen, if we cache images after scaling to preview resolution.
That's why I was asking for more concrete definition of STRIP_INPUT, but I guess that would be implementation detail as long as it would "just work"? By design this can be last operation that user have no control of. So "mipmap" in case of this design.
As Brecht explained, currently we use "temp cache" for 1 currently displayed frame where we store each possible cache level for fastest possible tweaking. The way it works is, that image you change and above are invalidated. Everything that is needed to render image you are changing is cached. When you change frame, images are discarded and cache is filled with new images.
This is currently hacked in into overall cache, because it is convinient, but it could be own cache or it could be not used at all. It benefits you when you change images close to final output, but also when you build up stack of effects.
See this example - playback speed is quite bad, but moving whole image with last transform strip is swift.
temp_cache.blend
I would say yes, but I have some doubts about that power of 2 prescaling. I would probably need to see it in action and then evaluate results.
I always edit videos with uniform resolution so I am not best case for testing this. I will have to do artificial tests.
This was done as part of bugfix (
5372924983
) and I am not sure if it should be changed. Also off-topic a bit :)I see that it doesn't cache for
Raw
but is (strip)Cache pre-prepossessed images
&Cache Final Image
not supposed to cache proxy data too? Because it does. *(you can see with the sample file from #80060 )//All I'm saying is there should be the ability (either hard coded or preference) to not have proxies pointlessly cached, not sure how that's off topic for a task about the cache system...
I see. Indeed the cache key in my proposal would behave the way you've described it.
Is this something we agree that is reasonable to do?
I think what you've described here is a good definition. At least, I can not currently think of a case when this definition "breaks".
This is great.
What I'm trying to understand is what exactly happens in the following scenario:
At which point the input needed for the tweak will be loaded?
Not sure if it matters too much for this specific design task, but I kind of want to understand existing behavior better :)
The power of two is for the scaling performance.
It doesn't need any trickery for average weights, can be vecoriszed and so things like this.
Yes. I would add, that in case of static image strips (image, color, text), the local frame should always point to frame 1 for
STRIP_INPUT
type.I think we already translate cfra to local frame anyway in
seq_cache_cfra_to_frame_index
.During rendering
BKE_sequencer_give_ibuf
:BKE_sequencer_cache_free_temp_cache
When strip is changed:
seq_render_strip_stack
We start rendering in "forward direction" starting, where first image is missing (changed strip).
That is pretty much whole cycle.
This feature requires image to be cached at all used states, that's why we also have
SEQ_CACHE_STORE_PREPROCESSED
andSEQ_CACHE_STORE_PREPROCESSED
as stack works with composited images and effects expects preprocessed image that haven't been compositedYes, I just can imagine situations, where it would be very useful and also situations where this could be detremental. And question is rather how to distinguish these situations, and where to draw line(in runtime) between do prescaling and don't, so it works fairly well.
But I definitely want to look at this method.
Added subscriber: @tintwotin
Could it be considered to implement caching for the playback functions needed for navigating a specific frame by using shortcut keys? Normally(industry-standard) when working with video you would use J(reverse), K(stop), L(forward) to increase and decrease(more key presses for more/less) playback speed and direction, meaning that prefetch caching should also cache reverse, and in steps for fast forward/reverse.
When the prefetch cache reaches the end of the range, it should automatically continue at the beginning of the range.
The Prefetch Cache is quite resource hungry, and most VSE operations will slow down while it's on(ex. "dragging" values), so a more strict pause-prefetch-caching regime could be implemented.
Also a way to cache for preview using blade-tool and trim-tools could be thought into the cache system.
@Sergey my plan of action would be:
STRIP_INPUT
and/orSEQUENCER_FINAL
SEQUENCER_FINAL
type to conserve memory usage.STRIP_INPUT
would be prioritizedThis order is chosen to ensure, that more risky changes get in first, so they can be tested properly
@tintwotin I would suggest to report this as a bug. It doesn't really fit into this design document.
Added subscriber: @clayhands
Added subscriber: @AndreaMonzini
Hello i agree with Peter with JKL forward/reverse cache.
For now i enable "Prefetch Frames" in the area where i want to cache and then i disable "Prefetch Frames" to be able to smoothly scrubbing ( forward/reverse ) around the area of interest.
A solution could be add an option to dynamically "center " the cache around the playhead :
Added subscriber: @Pipeliner
Added subscriber: @erjiang-3
This issue was referenced by
445ebcaa30
Added subscriber: @wknowleskellett