Page MenuHome

Multi-threaded multi-frame rendering for VSE export
Changes PlannedPublic

Authored by yesuu (yesuu) on Dec 22 2018, 6:28 PM.

Details

Reviewers
Brecht Van Lommel (brecht)
Group Reviewers
Video Sequencer
Summary
  • Add new threading API.
  • Parallel rendering of multiple frames.

VSE export speed is doubled on four core computer.

panic&bug:

  • Memory free is panic.
  • Complex break in original logic, G.break can cause concurrency bug.

todo:

  • The new threading API can wait in batch.
  • Disk IO is rewritten to asynchronous.

Why do I want a new threading API?
Batch thread management for tree structure, and uniform cancel syntax.
With the help of BLI_thread_update API, I can write lock-free program.

Diff Detail

Repository
rB Blender

Event Timeline

yesuu (yesuu) created this revision.Dec 22 2018, 6:28 PM
Brecht Van Lommel (brecht) requested changes to this revision.Thu, Dec 27, 1:46 PM

Why not use the existing thread pool API? It may not be lock-free, but that seems unlikely to be a bottleneck in this case? We want to avoid having too many different threading APIs.

I expect deeper changes are needed to make this really thread safe and reliable. There are likely hidden globals in the rendering pipeline, sequence rendering itself may not be thread safe, shallow copying Renderlikely fails in some ways. This all needs to be carefully checked.

Memory usage may be problematic on a computer with many cores, either running out of memory or thrashing the sequencer cache. When rendering with Cycles / EEVEE instead of the sequencer, it should not do this kind of threading.

This revision now requires changes to proceed.Thu, Dec 27, 1:46 PM

Thank you for your comment. @Brecht Van Lommel (brecht)

The New Threading API is designed to simplify some writing, without revolutionary use. It will integrate the capabilities of the pool, with the ultimate goal of replacing the thread pool API. In fact, I also wrote a version of the use of pool API!

The New Threading API manage threads on a tree-like basis. A tree is the equivalent of associating multiple pools, and operate commands can be passed to all subpools. The only operate command current exists is cancel. If there is a serial part in the parallel code, my solution is to pass the data to the parent thread through BLI_thread_update, and the parent thread execute this serial code part. The advantage of this is that it not only provides atomicity, and also provides order guarantee. I think the entire Blender's thread nesting will not exceed 5 layers.

With the existing pool API, I can also implement tree-like thread management, just the need to manually pass the operation to the child thread. The New Threading API and pool API are not abstractions at the same level. In addition to window events and to quantization algorithms, there are several multi-threaded code in the project. I can't take care of every multi-threaded code, and I have to need someone contribution. But I also hope to introduce code specifications, such as:

  • Cancel can only be exec by the parent thread, which cannot be exec by the child thread. Example: After the existing rendering logic is changed to multi-threading, each sub-thread can use G.break to cancel rendering. But the better way is that the child thread return an error. The parent thread detects the error, the parent thread cancel the rendering, and all the child threads that are in progress are also cancel. The parent thread detects that all the child threads are canceled, then recover the resources.
  • Put the serialization code on the parent thread as needed, which require something like the BLI_thread_update. Use lock if necessary.
  • The parent thread is responsible for complex thread scheduling code, and the child thread is responsible for the calculation logic. Prevent computational logic from intermingling with parallel scheduling logic.
  • Use Tree for batch thread management (BLI_thread_with), which is sugar, used to attract people to follow several other rules.
  • Unified cancel operation (BLI_thread_cancel). Unified progress notification (BLI_thread_update). Unified check thread is finished (BLI_thread_done).

If you implement the above specification with existing pool API, the code does not seem to be concise, and the new API is used to simplify the implementation of the specification. One deep reason is that I want to avoid resident threads, I want to minimize the thread life cycle. This requires parameters to be prepared when the thread is created, and the result is processed after the thread ends. Avoid reading values from the queue for processing, and avoid writing results to the queue for processing, because doing so will cause the thread to be resident. The so deep reason is just a style, it can not adapt to all scene, so the New Thread API does not only support this pattern.

Speaking back to the problem we faced, the multi-threaded multi-frame rendering needs to be consider too large scope.

Render shallow copy is the first step. After shallow copy, check all the code is safe and fix the insecure details. Or refactor all dependencies first, and add thread-safe comments to the dependent functions. The latter is a true deep optimization, and I cannot be achieved very quickly. This patch is in progress (WIP), and I will update the patch again.

I hope you can be sure of my direction of refactoring.

yesuu (yesuu) planned changes to this revision.Fri, Dec 28, 6:04 PM

The problem with adding a second way to do things is that you often end up with two systems, and threading code is already hard enough to understand as it is. I don't think we would accept another thread pool API unless it fully replaces the existing one, and I don't really see a justification for us investing the time to work on that.

Have you looked at the task pool system we have in BLI_task.h? It is designed to handle this kind of thing. Nested task pools are possible there, and tasks can be pushed while threads are running.

A more unified way to handle canceling could be good, but I don't think it requires this deep a change.

I don't think we would accept another thread pool API unless it fully replaces the existing one, and I don't really see a justification for us investing the time to work on that.

Ok, this patch will continue to be implemented with existing API. I will raise this issue again until the new API is sufficient to completely replace the existing API.

Have you looked at the task pool system we have in BLI_task.h? It is designed to handle this kind of thing. Nested task pools are possible there, and tasks can be pushed while threads are running.

I noticed its existence, but did not study it in depth. I will look at it later.

A more unified way to handle canceling could be good, but I don't think it requires this deep a change.

As I learn more about the existing system, I should be able to draw a precise conclusions about whether need depth reform.

We will talk about this later, and I will continue to review the concurrency security of all rendering code.