Page MenuHome

VSE cache with frame prefetching (WIP)
Needs ReviewPublic

Authored by Richard Antalik (ISS) on Mon, Nov 12, 12:03 PM.
Tokens
"Love" token, awarded by tintwotin."Like" token, awarded by PGTART."Love" token, awarded by snuq."Love" token, awarded by Scaredyfish."Love" token, awarded by pablovazquez.

Details

Reviewers
Brecht Van Lommel (brecht)
Group Reviewers
Video Sequencer
Summary

This patch implements frame prefetching as discussed in https://devtalk.blender.org/t/sequencer-cache-improvement-proposal/2151

Code is unfinished and ugly, but sufficient to present functionality. Sorry for unrelated changes - will fix that.

Operation:
Prefetching is triggered by rendering a frame.

First, prefetching job is initialized. Essentially SeqRenderData is copied as it will be freed eventually in main thread, and mainly a new Scene is created as a copy of original. This copy of Scene is used for rendering and animdata evaluation.

Number of frames to be prefetched is calculated by simulated run of prefetch.

Prefetch thread will render required amount of frames and store them in cache.

Other changes:

  • cache is now member of Scene struct and no longer global.
  • Cache limiter is "bypassed"
  • Removal of old prefetch code.
  • Limit multithreaded effects to keep 1 thread free, when called from prefetch thread. This is because using all possible threads lead to framerate drops.
  • Added DNA struct PrefetchSource, some operators and RNA API

Bugs / problems:

  • There is still thread collision in blf lib. So please test this not using text strip. Will look at it next time. Don't know that lib, will probably just add some public mutex(multithreaded proxies can use that also)...
  • Need to check order of processing of raw data. Did not see color transformation in preprocessing fn. Then merge all preprocessing. Maybe do this in separate patch.
  • PrefetchSource list is not restored on reopening the file. Don't know if we want to always start fresh, or restore them.
  • Some mem blocks are not freed because I am lazy. But in some cases it seems that cache items are not freed properly. They seems to be allocated without any identifying string and I haven't found, what that is yet.

TODO:

  • fix bugs
  • review scene cpy
  • support depsgraph - should be separate patch, as a port to 2.8?

Diff Detail

Event Timeline

Would love to test it, if you can share a build?

Here is win32 build:
https://drive.google.com/file/d/1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5/view?usp=sharing

If there is any problem with running build, try this one:
https://drive.google.com/file/d/1KSHAs5XftnC722P_AG2UGCgqhKFfi5Da/view?usp=sharing

I changed some stuff in MSVC. Don't know what the problem is, but I don't know what I am doing anyway...

Wow! On the official 2.79b I get 5 fps.
On your patch I get a clean 25 fps in the cached areas.

That is super impressive!!! And an excellent POC!

The file processed is a 1.8 GB file with a duration of 1.33 minutes:
Format : ProRes
Format version : Version 0
Format profile : 422 HQ
Duration : 1 min 33 s
Bit rate mode : Variable
Bit rate : 159 Mb/s
Width : 1 920 pixels
Height : 1 080 pixels
Frame rate : 25.000 FPS

Looking forward to more cache-strategies being implemented.

Here is win32 build:
https://drive.google.com/file/d/1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5/view?usp=sharing

If there is any problem with running build, try this one:
https://drive.google.com/file/d/1KSHAs5XftnC722P_AG2UGCgqhKFfi5Da/view?usp=sharing

I changed some stuff in MSVC. Don't know what the problem is, but I don't know what I am doing anyway...

Hey Richard! Awesome stuff you're working on. I have just granted you uploader rights on GraphicAll.org (missed it before, sorry), so you can share your builds there as well!

Brecht Van Lommel (brecht) requested changes to this revision.Wed, Nov 21, 8:24 PM

I think the approach is generally fine. Would be nice to see a more cleaned up patch to review in more detail.

Regarding the BLF crash, do you every launch multiple prefetch threads at the same time? The way the BLF API works the same font can only be used by one thread at a time. We have separate UI and "render" fonts to avoid that problem, but if multiple threads are using that render font there will be a problem.

source/blender/blenkernel/BKE_library.h
81

I would rename this to LIB_ID_COPY_SEQUENCER_PREFETCH, no need for obscure abbreviations.

source/blender/blenkernel/intern/scene.c
259

Scene copying is a bit problematic, but there is no way around it probably. We should avoid copying not only the node tree, but also things like view layers and rigid body world which relates to scene objects. For performance if nothing else.

I don't quite understand why copying the compositing node tree would cause problems, it is not a separate datablock but embedded into the scene, so I would expect it can be safely copied. But it seems unnecessary anyway.

source/blender/blenkernel/intern/sequencer.c
3927

If you're using clock() you should be using CLOCKS_PER_SEC to interpret it as seconds.

4079

What exactly is the purpose of this main copy?

I would not expect it to be needed for the evaluation of animation on a single scene datablock. If it's for rendering scene strips, I don't think a main with just the scene will work.

This revision now requires changes to proceed.Wed, Nov 21, 8:24 PM
Richard Antalik (ISS) marked 2 inline comments as done.Thu, Nov 22, 11:47 AM

I think the approach is generally fine. Would be nice to see a more cleaned up patch to review in more detail.

Thanks for feedback. I will provide hopefully finished patch, I guess this weekend.

Regarding the BLF crash, do you every launch multiple prefetch threads at the same time? The way the BLF API works the same font can only be used by one thread at a time. We have separate UI and "render" fonts to avoid that problem, but if multiple threads are using that render font there will be a problem.

There is only one prefetch thread running and it crashes only if I am moving stuff in UI.
It is trying to read some glyph cache.
Sorry for "exact" terminology, I am at work now...

Also after I fixed mem allocation (no unfreed blocks on exit), I tried to check undo functionality, it was not working, but I think, I will be able to deal with that.

source/blender/blenkernel/intern/scene.c
259

We can do this, but I was thinking, that maybe user is using drivers driven by objects to animate some properties of VSE strips. I don't really know how animation system works in blender so I rather copy everything.

Also scene copy is triggered by invalidating cache in VSE (triggered by any edit). So unless you have big scene with strips(you shouldn't) this shouldn't be a problem.

One thing about scene copy (more like throwing crazy ideas).

One way to go would be to use library's remap code to iterate over all datablocks used by a scene. This will copy a bit more than it should be, but will allow crazy scenarios like object driving sequencer property.

More proper way would be to use dependency graph and utilize its copy-on-write routines. In order to do this we'd need to implement building from sequencer. Shouldn't be too difficult, but maybe is more efficient if is done by someone who's already familiar with the code there.

For the initial implementation is probably enough to copy just a scene, without its viewlayers or objects, just to keep memory usage low.

It also seems the preprocess cache is removed. This is not very great, since this makes tweaking (like color grading) way slower. Ideally we need to keep preprocessed cache with a lower priority than a final cache.

source/blender/blenkernel/intern/scene.c
259

I don't think we need to support objects driving VSE strips, this probably never worked reliably.

If we did, we'd really need to build a depsgraph for prefetching and including all the required datablocks in there. Just copying the scene is not enough then. But I would consider that beyond the scope of this patch, no need to make it more complicated to begin with.

Still work in progress...
Scene copying was not modified yet.
Also haven't look at undo bug.

I was implementing caching sources for edited sequence on demand. This was a little bit more complex then I thought.
This should work now just for 1 edited sequence and only for seq render output before blending.
I wanted this prefetching to be completely autonomous, but deciding, when exactly user wants to stop doing edits is impossible. So there is button that user has to press to release cached frames,
Next feature should be support for keeping multiple edits on all levels(add preprocess + blended sequences), and UI to release them individually / all at once.

I wanted to use some delay, after user makes edit, so prefetch is not started immediately after changing something. Not sure if I can use event system for this.

Also I am using wm notifier to refresh UI, when prefetch thread is running - not sure if this is done well

UI panel to control prefetch was added

To enable / freeze prefetching
slider for minimum free frames for prefetching before cost is recalculated
enable / disable memory viewer

render cache thingy is not supported - idea is to cache all ibufs that were output during rendering for current frame. Problem is, that when sequence stack is already prefetched, sources are not rendered. So this is not quite possible
Cache everything will force prefetch to cache all ibufs that were output during rendering.
2 modes are possible - either prefetch, or cache past frames.
release edits operator button - described above

One thing about scene copy (more like throwing crazy ideas).

One way to go would be to use library's remap code to iterate over all datablocks used by a scene. This will copy a bit more than it should be, but will allow crazy scenarios like object driving sequencer property.

More proper way would be to use dependency graph and utilize its copy-on-write routines. In order to do this we'd need to implement building from sequencer. Shouldn't be too difficult, but maybe is more efficient if is done by someone who's already familiar with the code there.

For the initial implementation is probably enough to copy just a scene, without its viewlayers or objects, just to keep memory usage low.

I want to make this work on 2.8 so I guess I will have to do this using depsgraph. I have to review my libraries, because when I tried to build 2.8 branch it failed.
In 2.79 I guess I will have to use argument to enable depsgraph.
Is there some example on how to use depsgraph with fallback?

I am not very good with too much abstraction - I lost about a day debugging ghash bugs, until I realized, that iterating while another thread is freeing items is not a good idea :)

It also seems the preprocess cache is removed. This is not very great, since this makes tweaking (like color grading) way slower. Ideally we need to keep preprocessed cache with a lower priority than a final cache.

If you mean preprocess cache, that was called preprocess cache - it wasn't working
https://devtalk.blender.org/t/sequencer-cache-improvement-proposal/2151/10?u=iss

I don't want to remove it completely, but it has to be rewritten.

Hopefully there won't be any stupid bugs.
To be honest, I am surprised, that you don't yell at me for code quality :)

Yeah and C++ style comments are basically my todos and code commented by them is likely to be changed.

Richard Antalik (ISS) updated this revision to Diff 12828.EditedSat, Dec 8, 8:10 PM
Richard Antalik (ISS) edited the summary of this revision. (Show Details)

Status:
Functionality should be quite complete, only problem is preprocess cache, since preprocess ibufs may vary in size. But if you enable caching preprocess for some strips, prefetch may have problems (may try to fetch more, then cache capacity - wasted effort). It should be easy to detect such event and stop prefetching, which may be a way to go...

Code is still ugly and not optimized. Lots of iterations over cached items in decision process of what to do with them.
Even in this state however this patch is useful.

PrefetchSource:
This struct is created, when a change to strip is made.
It contains info about strip, affected area and what resource was required to re-render changed strip
For each required resource a PrefetchSource is created.
PrefetchSource can be created / deleted by user.
More API can be exposed, so it's effect can be automated by animation system, but dunno if it would be helpful.

Controls:
prefetch_enable: enable/disable prefetch
memview_enable: enable/disable cache content viewer
rendercache_enable: (enable/disable) keep all rendered resources cached until frame is completely rendered
cache_everything: (enable/disable) keep all rendered resources cached
disable_auto_add_sources: (enable/disable) do not create PrefetchSource when editing strips
ignore_sources: (enable/disable) do not cache resources defined by PrefetchSources

prefetch_store_ratio: slider - ratio of minimum frames used to prefetch vs frame capacity
when prefetch thread hits limit of minimum available frames, minimum cost of frames to be considered expensive is recalculated, so there will be at least as much frames free as set by this slider.
0 = never recalculate value
0,1 = recalculate value, when less then 1/10 of total frames are cheap
1 = always recalculate value so no frames are expensive

prefetch_offset: slider - by this ratio prefetch playhead will be offset from its target
0 = prefetching effectively disabled
0,5 = 1/2 frames cached before playhead 1/2 after playhead. Cheap frames that is.
1 = full prefetch

Remove all sources: OP - ...well remove all PrefetchSources
Cache [type] for selected strips: OP - add PrefetchSource manually

On the bottom there is list of individual sources with remove operator.

Issues:

Creation of source is triggered by cache invalidation. But maybe we don't want to create source, when just moving stuff around.
Also API is quite simplistic. I am not user at this point, but last thing I want is to annoy users by things like this. It is extremely useful, but can be annoying as well.
I can see user manually configuring cache, when doing task, that requires a lot of repetetive edits or "that last 30 frames are choppy, so I can free this resource at this point".
As I wrote this, I added disable_auto_add_sources and ignore_sources. That should do it :)

Missing data visualization for preprocess cache in memview. It would be cool if we could read frame cost / other details by hovering over with mouse cursor.

Seems, that sound scene is duplicated on each edit. Also I found this patch highly unstable, when sound strips are present. pleate test without sound strips.

Lots of small issues, where scene cpy is not updated or cache not cleared - will fix as I do some editing.

Use-cases:

Preview whole timeline:
Enable prefetch,
Enable rendercache,
enable ignore sources / delete all sources,
set prefetch_store_ratio to 1
set prefetch_offset to 0

Preview short part of timeline:
Enable prefetch,
Enable rendercache,
enable ignore sources / delete all sources,
set prefetch_store_ratio to 0,3 - 0,5 - depends on RAM and preview length
set prefetch_offset to 0 - 0,5 - depends on RAM and preview length

Preview short part of with edits:
Enable prefetch,
Enable rendercache,
set prefetch_store_ratio to 0,3 - 0,5 - depends on RAM and preview length
set prefetch_offset to 0 - 0,5 - depends on RAM and preview length

Edits on slooow parts:
disable prefetch,
Enable rendercache

I would say, that most settings can be pretty much universal, so I would keep Enable prefetch and ignore sources button somewhere by hand.

New win32 build (read open issues):
https://drive.google.com/open?id=1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5

I read, that I am not able to build x64 version with free MSVC. Sorry.

Also, subscribe to PewDiePie

source/blender/blenkernel/intern/sequencer.c
3739–3800

Removed this, because code is redundant and confusing

4065–4089

I modified this, but it was a mistake...

source/blender/editors/space_sequencer/sequencer_draw.c
1175–1182

is this OK?