Page MenuHome

Making ffmpeg use the maximum available threads, rather than just 1 ( default )
ClosedPublic

Authored by Mal Duffin (mal_cando) on Tue, Dec 4, 8:52 PM.

Diff Detail

Repository
rB Blender

Event Timeline

In the tests so far, this has had an approximately 20% speedup.

This correlates with the conversation in this link...

https://codereview.chromium.org/148423005/

  • Do we stand by that "2 threads is better than 1" policy decision made oh so long ago? @scherkus: do you have enough perf test infra set up now to report whether this would be a bad move or a good move? (if it's known to not regress metrics we care about, dropping a thread per <video> tag would be awesome!)

We do! Running locally:

./out/Release/media_perftests --gtest_filter=*MP4* --gtest_repeat=10 | awk
'/RESULT/ { t += $4; cnt += 1; } END { printf("Average over %d runs: %f\n", cnt,
t / cnt); }'

SLICE
Average over 10 runs: 3.313302

FRAME
Average over 10 runs: 3.943826

So we're getting ~19% bump. I believe the test file in question is *not* slice-based.

FF_THREAD_FRAME is slightly faster than FF_THREAD_SLICE

@Mal Duffin (mal_cando), Thanks for patch!
Are these properties set dynamically? there may be some cases, where we would want to keep just one thread - if we utilize multiple threads for actual rendering, for example D3934 or D3597.

In D3934 i was experiencing slowing of UI, due to image processing in background utilizing all available threads, but those were quite resource hungry tasks.

I am currently working on VSE performance issues, and ffmpeg implementation is one of them, not critical though...
There is some weird behavior - delays when accessing stream for the first time or after some pause, much slower playrate, when scrubbing backwards, also streams with P-frames should play smoothly when not seeking.

Just dumping this there if anybody feels like to tackle those issues :)

Are you testing this on a file where the cost of ffmpeg writing is dominant? A 20% speedup seems low if so.

Maybe all this is doing is making the ffmpeg encoding run in its own thread separate from the Blender main thread, which saves a little bit of time. But it's not actually encoding multiple frames in parallel, and something else is needed to do that?

extern/audaspace/plugins/ffmpeg/FFMPEGReader.cpp
185–186 ↗(On Diff #12761)

This is changing code in the external audapsace library, so looping in @Joerg Mueller (nexyon).

I would be careful with putting this in the audio reading code. Maybe it helps, but it could also slow things down if threading overhead is too much for realtime playback. We shouldn't make this kind of change without careful testing in any case.

I guess this change needs more evaluation. If the change targets video encoding, I don't see on what basis the changes in audaspace are necessary? Does audio en-/decoding actually benefit from the changes? It also seems a bit random to have those changes for playing audio files and mixing down audio, but not for decoding videos (within movie clip editor or VSE)?

Maybe all this is doing is making the ffmpeg encoding run in its own thread separate from the Blender main thread, which saves a little bit of time. But it's not actually encoding multiple frames in parallel, and something else is needed to do that?

I have a feeling this is all that is happening with the change, the ffmpeg library is threading its internal reading / writing of a single frame to get a small speedup.

if we utilize multiple threads for actual rendering, for example

In the case above, only the final frame write(s) should utilise the additional threading.

Regarding the audio conversion using more than one thread, I just made the changes in the places that were called when rendering out a video sequence at my end. Note that the code in FFMPEGWriter.cpp wasn't called in my test, I put it in there for completeness.

I choose to use 0 as the thread_count, rather than BLI_system_thread_count(), to save on additional includes unnecessarily.

It might be worth pushing the change temporarily to get some overall speed-up results from various VSE users, and revert if there are any issues.

Getting more VSE users to easily get access to the change and record before / after timings would be very useful ( there seems to currently be a good push with users, and some active developers able to put some time into making the changes ).

It might be worth pushing the change temporarily to get some overall speed-up results from various VSE users, and revert if there are any issues.

Getting more VSE users to easily get access to the change and record before / after timings would be very useful ( there seems to currently be a good push with users, and some active developers able to put some time into making the changes ).

Agree. If this change is as harmless as it sounds (basically changing hardcoded 1 thread to automatic), I'd say it's worth a test. There are plenty of VSE enthusiasts willing to do heavy testing but compiling can be cumbersome.

User testing is not going to teach us much. It is clear that we need to use the ffmpeg API differently to get a significant multithreading speedup. That requires a developer to look at the ffmpeg docs and code.

We could commit this with thread count set to 2 and the audaspace changes left out, for the purpose of getting the small speedup. We should not create so many threads if they are not going to be used.

yesuu (yesuu) added a subscriber: yesuu (yesuu).EditedSat, Dec 8, 12:59 AM

It is clear that we need to use the ffmpeg API differently to get a significant multithreading speedup. That requires a developer to look at the ffmpeg docs and code.

I think this is already multi-frame encoding in parallel. I don't know how to explain it. I have seen the ffmpeg code, and now my mind is very messy.

After multi-thread is turned on, avcodec_encode_video2 is non-blocking, and blocking get result every few calls. Open multi-thread, the avcodec_encode_video2 function execution time is 2% before.

We could commit this with thread count set to 2 and the audaspace changes left out, for the purpose of getting the small speedup.

I'm happy with that, I've made the requested changes.

I think this is already multi-frame encoding in parallel. I don't know how to explain it. I have seen the ffmpeg code, and now my mind is very messy.

After multi-thread is turned on, avcodec_encode_video2 is non-blocking, and blocking get result every few calls. Open multi-thread, the avcodec_encode_video2 function execution time is 2% before.

I think you're right, the function doesn't always return a result which allows FFMPEG to use multiple threads in the background. I ran a quick profile and noticed that for example color management is already slowing things down enough to make ffmpeg not need to use any more threads than two.

So we can actually commit with thread_count = 0;, in the hope that later on Blender gets optimized to actually take advantage of more threads.

This revision is now accepted and ready to land.Tue, Dec 11, 7:42 PM

So we can actually commit with thread_count = 0;, in the hope that later on Blender gets optimized to actually take advantage of more threads.

Sounds good, I've made the change.

This revision was automatically updated to reflect the committed changes.