Page MenuHome

VSE cache with frame prefetching (WIP)
Needs ReviewPublic

Authored by Richard Antalik (ISS) on Nov 12 2018, 12:03 PM.
Tokens
"Love" token, awarded by amonpaike."Love" token, awarded by n-pigeon."Love" token, awarded by tintwotin."Like" token, awarded by PGTART."Love" token, awarded by snuq."Love" token, awarded by Scaredyfish."Love" token, awarded by pablovazquez.

Details

Reviewers
Brecht Van Lommel (brecht)
Group Reviewers
Video Sequencer
Summary

This patch:

  • refactors sequencer cache and allows configuration of cache from UI
  • implements memory visualization tool for viewing cache content
  • implements prefetching for sequencer
  • refactors draw_image_seq - separate patch: D4315: Refactor drawing preview

These items should be (quite) functionally independent, so the patch can be split.

Sequencer cache

Overview:
Each Scene owns 1 sequencer cache of MovieCache type which is used by sequencer belonging to this scene.

This cache should be viewed as "3D" space for stored images:

  • each frame of timeline can be cached
  • frame is composed of number of strips, that can be cached individually
  • each strip has number of rendering stages of which 3 can be cached

Cached stages

  • per strip
    • SEQ_CACHE_RAW raw image: image as provided by input codec
    • SEQ_CACHE_PREPROCESSED pre-processed: after scaling and color transformations, ...
    • SEQ_CACHE_COMPOSITE composite: combined with another strip(s)
  • sequencer strip stack output
    • SEQ_CACHE_FINAL_OUT this is a copy of last(topmost) strip composite image, so if both cached, memory usage is about the same as if only 1 of them is cached
    • SEQ_CACHE_FINAL_COLOR_MANAGED color corrected strip stack output: this is the image, that is displayed in preview window

User can enable/disable storage of any of these types, to allocate resources efficiently.

Image cost
Each stored image(SeqcacheKey->cost) has a cost assigned. Cost is calculated as ratio of time spent on rendering to maximum possible time to keep up with chosen frame rate. The higher the cost, the harder it is to render image.
Initially frames with cost less then 1 are considered to be cheap. This value is recalculated each time the cache is full or when prefetch process is started.

Cache size limiting
To limit cache size, a custom method is used because of rather complex cache arrangement. "Stock" cache limiter is currently used only for setting used / free memory in UI. (TODO - definitely not ideal)

When cache is full, most distant cheap frame from playhead is removed. With prefetching enabled most distant cheap frame is removed, with frames behind playhead taking precedence.
Function BKE_sequencer_cache_recycle_item is defining behavior of removing "old" frames

Temporary cache
Sometimes it is required to use 1 image multiple times to render single frame. To prevent unnecessary re-rendering, all stages of rendering are stored in cache until the frame is rendered completely. Images created in stages with disabled cache storage has SeqCacheKey property free_after_render set and are then freed after a frame is rendered.

Dependency
To ensure, that stages chosen by user are always cached, a dependency system is used. To achieve this, when removing an image from cache we also have to remove all images that were created using this image.

Currently, this process is simplified a bit by removing only final images of particular frame. When removing final image(has SeqCacheKey->is_base_frame set) we follow SeqCacheKey->prev or SeqCacheKey->next links to remove all images used to create final image
To avoid iterating content of the whole cache, cache keys are linked by pointers in both ways

In case of speed strip, one image can be used to render multiple frames. This is not adressed yet.

MovieCache modifications

    • moved to header:
    • Typedefs MovieCache, MovieCacheKey, MovieCacheItem, to avoid redefinitions
    • IMB_moviecache_get_mem_total, IMB_moviecache_get_mem_used, IMB_get_size_in_memory
  • added to API
    • void IMB_moviecache_limiter_enable(MovieCache *cache)
    • void IMB_moviecache_limiter_disable(MovieCache *cache)
    • added MovieCache members:
    • MEM_CacheLimiterC *limiter - can be set to null to avoid limiter usage
    • float expensive_min_val - for evaluating value of stored images
    • MovieCacheKey *last_key, void *last_userkey - can be used to build dependencies between stored images
    • size_t memory_used - To track memory usage of cache (limiter has to update this value)
    • ThreadMutex cache_mutex - To protect ghash iterator / user limiting application. Maybe user_mutex would be a better name?

Issues
SEQ_CACHE_FINAL_COLOR_MANAGED is not implemented yet. First draw_image_seq must be refactored
I should also check how GLSL color transformation works - I am not sure if I can read transformed image back. I guess it is possible to read back generated texture?

Plans for future work
Setting cached stages per strip:

  • may be actually included in this patch - not too much code

Image dumping:

  • Add MovieCache->dumper
    • This will be callback, that will accept void *userkey, ImBuf *ibuf (anything else?)
  • Sequencer should set callback similar to function seq_proxy_build_frame only it will get SeqCacheKey and ImBuf
  • Save image to <project-path>/<scene-name>/<seq-name>/<cache-type[Number]>/cfra.ext
    • formats to use should be (at least to start with) JPEG, PNG, EXR
    • path MUST be properly escaped (ideally scene/strip names should be limited to be filename compliant)

Cache invalidation:

  • Proper system must be developed, that will determine, what images must be deleted, by provided action and strips affected
  • With Image dumping, We will need a means to manage validity of external files.
    • Should be fast and shouldn't use too much memory. Saving checksums of images is probably most effective solution.
    • Does not have to be 100% correct depending on implementation? This may not be the best idea, as it would mean, that there would have to be a manual full check routine at least for range, which would complicate code significantly...
    • checksum table file must be pernament. Either as external file(slow?), or included in .blend itself as a data structure(may result in larger .blend file). Tables must be resizeable, appendable, randomly accessible.
    • if file is to be read to calculate checksum, can we save time, by reading every n-th byte?

Example:
10min of footage in timeline at 60FPS, SEQ_CACHE_COMPOSITE and SEQ_CACHE_FINAL_OUT are dumped
36000 frames total, ~70KB of data with CRC8 checksum

Memory visualization tool

This is a tool for visualizing cache content inspired by Movie Clip editor.
When enabled, grey stripe is drawn in channel 0. Cached images are represented by 1 frame wide line either on strip to which they belong to or in channel 0 for types SEQ_CACHE_FINAL_OUT and SEQ_CACHE_FINAL_COLOR_MANAGED.
In strip itself types SEQ_CACHE_RAW, SEQ_CACHE_PREPROCESSED and SEQ_CACHE_COMPOSITE are drawn in order from bottom to top.

Represented image color tint can vary from blue(cheap) to red(expensive) according to calculated cost with cost of 1 being a crossing point from cheap to expensive.
Color scheme is not the best one, cost calculation is not protected from overflowing + method may be flawed

This tool can be enabled or disabled by setting memview_enable RNA property.
Function draw_cache_memview is used to draw this tool

Issues

  • cost calculation is not protected from overflowing + method may be flawed. Will have to review this
  • Is ugly...

Sequencer Prefetching

Overview:
Prefetching is background process, that has a goal to fill the cache with frames, that are (most likely) to be displayed. Each Scene owns 1 PrefetchJob struct, that contain neccessary data to run prefetching job.
To run 2 or more rendering threads at once, one copy of scene per thread needs to be created. This is to allow threaded animdata evaluation and animating strip properties without affecting playback in the main thread.
Only data required by sequencer are copied and scene is copied outside of bmain struct, so user can not access it.

Currently, only 1 prefetch job can run at a moment.
Prefetch thread is running until stop bit is set. When there is no work to be done, pthread_cond_wait is used to suspend prefetch thread.

To start prefetching:

  • Conditions:
    • Enable prefetching
    • Some Cache storage must be enabled
    • Prefetch thread must not be running.
    • User must not scrub timeline
    • "expensive" footage must not be played
  • Rendering a frame preview (move playhead) will try to start prefetching

Range evaluation
Memory usage is analyzed during run time to determine when to pause prefetching.

Invalidation of scene copy
When User edits or affects a strip's output, strip's cache is invalidated. This process was appended by prefetch scene invalidation. Old scene copy is freed and replaced by new one. If scene strips are to be supported, This should be optimized by more selective invalidation or shallow copy + lazy loading in style of depsgraph. Note that cache invalidation in sequencer is not complete yet and some edits will go unnoticed.

Controls
User has control over prefetching by following controls:

  • prefetch_enable (bool) - enables/disables prefetching
  • prefetch_store_ratio (float 0 to 1) - ratio of new prefetched frames to expensive frames already in cache
  • prefetch_offset (float 0 to 1) - after number of frames to be prefetched is calculated, it is multiplied by this number, so cached frames are trailing behind playhead

While prefetch process is running and memview is enabled, WM is "forced" to redraw sequencer by BKE_sequencer_prefetch_wm_notify, so user can see progress in real time.

Other changes

  • old prefetch code removed.
  • Limit multithreaded effects to keep 1 thread free, when called from prefetch thread. This is because using all possible threads lead to framerate drops and unresponsive UI.

Plans for future work

  • performance of most strips is quite constant so we can spare resources on movies with proxies and focus on more distant parts of timeline, that needs actual prefetching.
  • strip worker as a replacement of proxy for scenes and other strips
    • depends on cache dumping images to disk and proper / better cache invalidation(including management of dumped files)

Issues

  • "Thread collision" in blf lib while rendering text strips
  • Scene copying unfinished

Default values

Prior to this patch only (equivalent to) SEQ_CACHE_PREPROCESSED and SEQ_CACHE_COMPOSITE were cached.
Caching only SEQ_CACHE_FINAL_OUT or SEQ_CACHE_FINAL_COLOR_MANAGED will result in *consistent* length of cached content. With "Temporary cache" providing means to quite responsive editing of strips(as long as editing last in chain).

Memview is quite nice tool, provides essential info to user, so it should be enabled. Problem may be, that it's ugly.
Cost visualization should be disabled by default.
prefetch_store_ratio is questionable. About 0.5 is probably fine.
prefetch_offset should be 0

Biggest question is if prefetching should be enabled by default. I would agree to this, but rather not in first release and only after setting default cache size limit to about 512MB.
With footage optimized for fast loading you can fill whole RAM in just a few seconds. There are a lot of users with 4GB of RAM and less.
If I accidentaly left 4GB limit as is set now, and started prefetching, my system wasn't able to recover from such event.
Even if I realized, what I just did, there was no time to react, and stop prefetching

Diff Detail

Event Timeline

Would love to test it, if you can share a build?

Here is win32 build:
https://drive.google.com/file/d/1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5/view?usp=sharing

If there is any problem with running build, try this one:
https://drive.google.com/file/d/1KSHAs5XftnC722P_AG2UGCgqhKFfi5Da/view?usp=sharing

I changed some stuff in MSVC. Don't know what the problem is, but I don't know what I am doing anyway...

Wow! On the official 2.79b I get 5 fps.
On your patch I get a clean 25 fps in the cached areas.

That is super impressive!!! And an excellent POC!

The file processed is a 1.8 GB file with a duration of 1.33 minutes:
Format : ProRes
Format version : Version 0
Format profile : 422 HQ
Duration : 1 min 33 s
Bit rate mode : Variable
Bit rate : 159 Mb/s
Width : 1 920 pixels
Height : 1 080 pixels
Frame rate : 25.000 FPS

Looking forward to more cache-strategies being implemented.

Here is win32 build:
https://drive.google.com/file/d/1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5/view?usp=sharing
If there is any problem with running build, try this one:
https://drive.google.com/file/d/1KSHAs5XftnC722P_AG2UGCgqhKFfi5Da/view?usp=sharing
I changed some stuff in MSVC. Don't know what the problem is, but I don't know what I am doing anyway...

Hey Richard! Awesome stuff you're working on. I have just granted you uploader rights on GraphicAll.org (missed it before, sorry), so you can share your builds there as well!

Brecht Van Lommel (brecht) requested changes to this revision.Nov 21 2018, 8:24 PM

I think the approach is generally fine. Would be nice to see a more cleaned up patch to review in more detail.

Regarding the BLF crash, do you every launch multiple prefetch threads at the same time? The way the BLF API works the same font can only be used by one thread at a time. We have separate UI and "render" fonts to avoid that problem, but if multiple threads are using that render font there will be a problem.

source/blender/blenkernel/BKE_library.h
81

I would rename this to LIB_ID_COPY_SEQUENCER_PREFETCH, no need for obscure abbreviations.

source/blender/blenkernel/intern/scene.c
259

Scene copying is a bit problematic, but there is no way around it probably. We should avoid copying not only the node tree, but also things like view layers and rigid body world which relates to scene objects. For performance if nothing else.

I don't quite understand why copying the compositing node tree would cause problems, it is not a separate datablock but embedded into the scene, so I would expect it can be safely copied. But it seems unnecessary anyway.

source/blender/blenkernel/intern/sequencer.c
3927

If you're using clock() you should be using CLOCKS_PER_SEC to interpret it as seconds.

4079

What exactly is the purpose of this main copy?

I would not expect it to be needed for the evaluation of animation on a single scene datablock. If it's for rendering scene strips, I don't think a main with just the scene will work.

This revision now requires changes to proceed.Nov 21 2018, 8:24 PM
Richard Antalik (ISS) marked 2 inline comments as done.Nov 22 2018, 11:47 AM

I think the approach is generally fine. Would be nice to see a more cleaned up patch to review in more detail.

Thanks for feedback. I will provide hopefully finished patch, I guess this weekend.

Regarding the BLF crash, do you every launch multiple prefetch threads at the same time? The way the BLF API works the same font can only be used by one thread at a time. We have separate UI and "render" fonts to avoid that problem, but if multiple threads are using that render font there will be a problem.

There is only one prefetch thread running and it crashes only if I am moving stuff in UI.
It is trying to read some glyph cache.
Sorry for "exact" terminology, I am at work now...

Also after I fixed mem allocation (no unfreed blocks on exit), I tried to check undo functionality, it was not working, but I think, I will be able to deal with that.

source/blender/blenkernel/intern/scene.c
259

We can do this, but I was thinking, that maybe user is using drivers driven by objects to animate some properties of VSE strips. I don't really know how animation system works in blender so I rather copy everything.

Also scene copy is triggered by invalidating cache in VSE (triggered by any edit). So unless you have big scene with strips(you shouldn't) this shouldn't be a problem.

One thing about scene copy (more like throwing crazy ideas).

One way to go would be to use library's remap code to iterate over all datablocks used by a scene. This will copy a bit more than it should be, but will allow crazy scenarios like object driving sequencer property.

More proper way would be to use dependency graph and utilize its copy-on-write routines. In order to do this we'd need to implement building from sequencer. Shouldn't be too difficult, but maybe is more efficient if is done by someone who's already familiar with the code there.

For the initial implementation is probably enough to copy just a scene, without its viewlayers or objects, just to keep memory usage low.

It also seems the preprocess cache is removed. This is not very great, since this makes tweaking (like color grading) way slower. Ideally we need to keep preprocessed cache with a lower priority than a final cache.

source/blender/blenkernel/intern/scene.c
259

I don't think we need to support objects driving VSE strips, this probably never worked reliably.

If we did, we'd really need to build a depsgraph for prefetching and including all the required datablocks in there. Just copying the scene is not enough then. But I would consider that beyond the scope of this patch, no need to make it more complicated to begin with.

Still work in progress...
Scene copying was not modified yet.
Also haven't look at undo bug.

I was implementing caching sources for edited sequence on demand. This was a little bit more complex then I thought.
This should work now just for 1 edited sequence and only for seq render output before blending.
I wanted this prefetching to be completely autonomous, but deciding, when exactly user wants to stop doing edits is impossible. So there is button that user has to press to release cached frames,
Next feature should be support for keeping multiple edits on all levels(add preprocess + blended sequences), and UI to release them individually / all at once.

I wanted to use some delay, after user makes edit, so prefetch is not started immediately after changing something. Not sure if I can use event system for this.

Also I am using wm notifier to refresh UI, when prefetch thread is running - not sure if this is done well

UI panel to control prefetch was added

To enable / freeze prefetching
slider for minimum free frames for prefetching before cost is recalculated
enable / disable memory viewer

render cache thingy is not supported - idea is to cache all ibufs that were output during rendering for current frame. Problem is, that when sequence stack is already prefetched, sources are not rendered. So this is not quite possible
Cache everything will force prefetch to cache all ibufs that were output during rendering.
2 modes are possible - either prefetch, or cache past frames.
release edits operator button - described above

One thing about scene copy (more like throwing crazy ideas).
One way to go would be to use library's remap code to iterate over all datablocks used by a scene. This will copy a bit more than it should be, but will allow crazy scenarios like object driving sequencer property.
More proper way would be to use dependency graph and utilize its copy-on-write routines. In order to do this we'd need to implement building from sequencer. Shouldn't be too difficult, but maybe is more efficient if is done by someone who's already familiar with the code there.
For the initial implementation is probably enough to copy just a scene, without its viewlayers or objects, just to keep memory usage low.

I want to make this work on 2.8 so I guess I will have to do this using depsgraph. I have to review my libraries, because when I tried to build 2.8 branch it failed.
In 2.79 I guess I will have to use argument to enable depsgraph.
Is there some example on how to use depsgraph with fallback?

I am not very good with too much abstraction - I lost about a day debugging ghash bugs, until I realized, that iterating while another thread is freeing items is not a good idea :)

It also seems the preprocess cache is removed. This is not very great, since this makes tweaking (like color grading) way slower. Ideally we need to keep preprocessed cache with a lower priority than a final cache.

If you mean preprocess cache, that was called preprocess cache - it wasn't working
https://devtalk.blender.org/t/sequencer-cache-improvement-proposal/2151/10?u=iss

I don't want to remove it completely, but it has to be rewritten.

Hopefully there won't be any stupid bugs.
To be honest, I am surprised, that you don't yell at me for code quality :)

Yeah and C++ style comments are basically my todos and code commented by them is likely to be changed.

Richard Antalik (ISS) edited the summary of this revision. (Show Details)EditedDec 8 2018, 8:10 PM
Richard Antalik (ISS) updated this revision to Diff 12828.

Status:
Functionality should be quite complete, only problem is preprocess cache, since preprocess ibufs may vary in size. But if you enable caching preprocess for some strips, prefetch may have problems (may try to fetch more, then cache capacity - wasted effort). It should be easy to detect such event and stop prefetching, which may be a way to go...

Code is still ugly and not optimized. Lots of iterations over cached items in decision process of what to do with them.
Even in this state however this patch is useful.

PrefetchSource:
This struct is created, when a change to strip is made.
It contains info about strip, affected area and what resource was required to re-render changed strip
For each required resource a PrefetchSource is created.
PrefetchSource can be created / deleted by user.
More API can be exposed, so it's effect can be automated by animation system, but dunno if it would be helpful.

Controls:
prefetch_enable: enable/disable prefetch
memview_enable: enable/disable cache content viewer
rendercache_enable: (enable/disable) keep all rendered resources cached until frame is completely rendered
cache_everything: (enable/disable) keep all rendered resources cached
disable_auto_add_sources: (enable/disable) do not create PrefetchSource when editing strips
ignore_sources: (enable/disable) do not cache resources defined by PrefetchSources

prefetch_store_ratio: slider - ratio of minimum frames used to prefetch vs frame capacity
when prefetch thread hits limit of minimum available frames, minimum cost of frames to be considered expensive is recalculated, so there will be at least as much frames free as set by this slider.
0 = never recalculate value
0,1 = recalculate value, when less then 1/10 of total frames are cheap
1 = always recalculate value so no frames are expensive

prefetch_offset: slider - by this ratio prefetch playhead will be offset from its target
0 = prefetching effectively disabled
0,5 = 1/2 frames cached before playhead 1/2 after playhead. Cheap frames that is.
1 = full prefetch

Remove all sources: OP - ...well remove all PrefetchSources
Cache [type] for selected strips: OP - add PrefetchSource manually

On the bottom there is list of individual sources with remove operator.

Issues:

Creation of source is triggered by cache invalidation. But maybe we don't want to create source, when just moving stuff around.
Also API is quite simplistic. I am not user at this point, but last thing I want is to annoy users by things like this. It is extremely useful, but can be annoying as well.
I can see user manually configuring cache, when doing task, that requires a lot of repetetive edits or "that last 30 frames are choppy, so I can free this resource at this point".
As I wrote this, I added disable_auto_add_sources and ignore_sources. That should do it :)

Missing data visualization for preprocess cache in memview. It would be cool if we could read frame cost / other details by hovering over with mouse cursor.

Seems, that sound scene is duplicated on each edit. Also I found this patch highly unstable, when sound strips are present. pleate test without sound strips.

Lots of small issues, where scene cpy is not updated or cache not cleared - will fix as I do some editing.

Use-cases:

Preview whole timeline:
Enable prefetch,
Enable rendercache,
enable ignore sources / delete all sources,
set prefetch_store_ratio to 1
set prefetch_offset to 0

Preview short part of timeline:
Enable prefetch,
Enable rendercache,
enable ignore sources / delete all sources,
set prefetch_store_ratio to 0,3 - 0,5 - depends on RAM and preview length
set prefetch_offset to 0 - 0,5 - depends on RAM and preview length

Preview short part of with edits:
Enable prefetch,
Enable rendercache,
set prefetch_store_ratio to 0,3 - 0,5 - depends on RAM and preview length
set prefetch_offset to 0 - 0,5 - depends on RAM and preview length

Edits on slooow parts:
disable prefetch,
Enable rendercache

I would say, that most settings can be pretty much universal, so I would keep Enable prefetch and ignore sources button somewhere by hand.

New win32 build (read open issues):
https://drive.google.com/open?id=1JDzIHHNA9qWvoXvVIShf9e-7l8bm-re5

I read, that I am not able to build x64 version with free MSVC. Sorry.

Also, subscribe to PewDiePie

source/blender/blenkernel/intern/sequencer.c
3739–3800

Removed this, because code is redundant and confusing

4065–4089

I modified this, but it was a mistake...

source/blender/editors/space_sequencer/sequencer_draw.c
1175–1182

is this OK?

I 'm really interested by this patch I DL the release but I don't how to install it ?
Did you put the ms_**** file direct in main program ?

Richard Antalik (ISS) edited the summary of this revision. (Show Details)Jan 3 2019, 12:26 PM
Richard Antalik (ISS) updated this revision to Diff 13094.

Some more refactoring / cleaning could be done. Especially prefetch range finding and cache base seq/type determination. Also cache recycle fn should be split..

I decided to 'remove "Sources" for cache as this patch is big enough and support wasn't optimal.

To have at least some control over cache, type selector was added. This lead to need for cache frame linking. Removal of last(base) linked item will cause entire chain to be deleted

Idea is to prevent freeing items, that have following frame still in cache. So all items will be re-rendered and re-cached as they should.
If freeing wasn't linked, scrubbing timeline, then playing it would result in incosistent caching(holes)

Now instead of re-implementing "Sources" much nicer solution seems to be adding cache type selection per sequence, that can override or restrict "global" settings.

Cache limiter limit enforcing is bypassed. Other functions are not.
This can be set for each cache individually (I should probably make it look if callback is defined to enable/disable this feature instead of using flag).
I think that modifying limiter to allow free process as in this patch in is not very useful.

Richard Antalik (ISS) marked 4 inline comments as done.Jan 31 2019, 9:39 PM
Richard Antalik (ISS) updated this revision to Diff 13456.

Updated diff.

In progress of refactoring draw_image_seq. I want to cache strips after color transformation, and I simply couldn't work with that function as it was written.

Scopes are broken, otherwise there should be no problem

Richard Antalik (ISS) edited the summary of this revision. (Show Details)Feb 4 2019, 5:59 PM
Richard Antalik (ISS) set the repository for this revision to rB Blender.

Updated summary to short-ish documentation.

I wanted to send finished draw_image_seq refactor, but forgot diff at another PC

I read your updated doc, and here are a few notes :

    • I think we can keep a single thread for prefetching : rendering a single frame should already be multithreaded in many cases (renders, effects, etc...). Would doing this remove the need for error-prone scene copying, or is it inevitable ?
  • I have many issues with a simple color strip : it initially apprears black, then when I enable prefetching it turns red, but there is no prefetching... When i scrub it flickers from red to black, etc...
  • I would love to see support for scene strips, disk cache, and per-strip selection (pretty much everything you mention in "TODOs" !)
  • I would love for there to be a "Prefetch this whole strip" option, to enable pre-composition of scene or meta strips. It would have to cache to disk in this case, and prefetch even frames where the strip is not visible (when it is under another strip for instance).

Finish refactoring draw_image_seq

This will be posted as separate patch after I clean comments and do review

I read your updated doc, and here are a few notes :

  • I think we can keep a single thread for prefetching : rendering a single frame should already be multithreaded in many cases (renders, effects, etc...). Would doing this remove the need for error-prone scene copying, or is it inevitable ?

Problem is, that you have awful UI lag, when rendering stuff that takes a lot of time per frame.
I am actually thinking about making preview itself threaded.

No lag(~50ms+) is acceptable in UI

  • I have many issues with a simple color strip : it initially apprears black, then when I enable prefetching it turns red, but there is no prefetching... When i scrub it flickers from red to black, etc...

As I said, invalidation is currently in very bad state.
This would happen even in 2.7:

add color strip
scrub
change color
play back

Currently we use "refresh sequencer" button to clear cache (will reload prefetching scene also)
In 2.7 cache is not limited, so once you fill the cache, it stays untouched untill you "refresh" it.

  • I would love to see support for scene strips, disk cache, and per-strip selection (pretty much everything you mention in "TODOs" !)

Not in this patch. We have to make this bug free. This makes pretty good foundation, that is unlikely to be heavily modified to allow working on these things.

As for rendering scene strips - turns out, that I am stupid...
I will update docs, to be more clear what scene copy is actually used for, because problems I listed are non problems :) I forgot about them and my memory was damaged. Must have missed few refresh cycles...

  • I would love for there to be a "Prefetch this whole strip" option, to enable pre-composition of scene or meta strips. It would have to cache to disk in this case, and prefetch even frames where the strip is not visible (when it is under another strip for instance).

Well it would be one of many possible modifications of this basic prefetcher.
Some of these variations can run in parallel with basic prefetching, some can be one-time run by user(pre-composition), some may wait for user inactivity...

Speaking about dumping/pre-composition:
It can be worked on independently to this patch. I was thinking about quite simple scheme:

  • Add MovieCache->dumper or better name, I can't think about any better now
    • This will be callback, that will accept void *userkey, ImBuf *ibuf (anything else?)
  • Sequencer should set callback similar to function seq_proxy_build_frame only it will get SeqCacheKey and ImBuf
  • Save image to <project-path>/<scene-name>/<seq-name>/<cache-type[Number]>/cfra.ext
    • formats to use should be (at least to start with) JPEG, PNG, EXR
    • path MUST be properly escaped (ideally scene/strip names should be limited to be filename compliant)

then just do in moviecache.c

if (BLI_ghash_reinsert(cache->hash, key, item, moviecache_keyfree, moviecache_valfree)) {
    if (cache->dumper){
        cache->dumper(userkey, ibuf);
    }
    ...
}

and reverse to get image

Thing, we don't know how to do yet is maintaining validity of those images.
I will be thinking about it and will send some ideas. Considering, that validity tracking was never implemented in scene proxies anyway, we can start even without this...

Richard Antalik (ISS) edited the summary of this revision. (Show Details)Feb 5 2019, 12:29 PM
Richard Antalik (ISS) edited the summary of this revision. (Show Details)Feb 7 2019, 9:01 PM
Richard Antalik (ISS) edited the summary of this revision. (Show Details)Feb 18 2019, 2:50 AM
Richard Antalik (ISS) updated this revision to Diff 13716.

MovieCache struct removed members:

  • bool use_limiter_to_free - when false, limiter wont be used to free stored images (this one is probably a bad idea)
  • bool insert_allowed - when false, cache will not accept new images (even re-entry)

MovieCache struct added members:

  • ThreadMutex cache_mutex - To protect ghash iterator / user limiting application. Maybe user_mutex would be a better name?

added to moviecache API:

  • void IMB_moviecache_limiter_enable(MovieCache *cache)
  • void IMB_moviecache_limiter_disable(MovieCache *cache)

IMB_moviecache_limiter_enable is called in IMB_moviecache_create, so by default each cache, that is created will use "builtin" limiter. Use of limiter is not enforced.

Prefetch operation changes:

  • Removed simulation of prefetch run.
    • memory usage is analyzed during run time to determine when to stop prefetching
    • prefetch_store_ratio and prefetch_offset has no effect currently
    • prefetch thread is now running until stop bit is set. When there is no work to be done, pthread_cond_wait is used to suspend prefetch thread
  • No memory leaks, yay!
  • No compile errors apart from few traps for erratic function

Cache TODO;

  • Resolve problem of invalid userkeys, when iterating cache. Problem is, that I don't know, how to ghash...
  • Review cache linking as that introduced some code that may be optimized.
  • It would be great to localize "context swapping" in cache when rendering scene copy, so code will be universal. I guess call by pointer is a way to go.

Prefetch TODO:

  • Restore functionality of prefetch_store_ratio(will be managed by cache) and prefetch_offset
  • I guess we can finally do proper scene copy now. Good idea would be to use original seqbase, so we can get rid of code that looks up sequences in original seqbase when using prefetching