Short description of the problem
When using video files in movie strips in Blender's Video Sequence Editor - sometimes the videos are impossible to sync, becasue the timeing of the video streams are warping unpredictably - this breaks audio/video sync while editing and in final renders and makes editing movies impossible.
Introduction and my personal history with this issue
Since I started using Blender 2.5 I have experienced severe problems with syncing video to audio in Blender Video Sequence Editor.
Syncing together multi-camera footage was even more impossible. I'm not sure but I feel like the problem was non-existent before Blender 2.5
I was very frustrated with this and had a few-year long brake in using Blender for video editing.
I published this video right before I agave up:
You can hear that I speak, while my lips don't move an vice versa. But it's not just the webcam part, all of the video is totally out of sync with the video. The raw footage I captured with RecordMyDesktop was playing fine alone - after editing and rendering it - the audio and video never align in the screencast part - what's strange, a part captured with Canon 550D DSLR is fine.
After a few years I have finally gave Blender another try. I'm currently using Blender again to edit videos, and I managed to edit and render a 75-minute video without experiencing this problem. Heres that video:
You can see a few frames static A/V desync but that's totally correctible. It's not what I'm talking about here.
Two days ago I recorded footage with my webcam (the same way I recorded footage for the video above) and with a camcorder.
I recorded a visible and audible hand clap to help myself syncing - I tried to sync these movie strips together by their audio first. This seemed to fail. The footage didn't seem to align with the sound. So I tried by frames - I found and locked on the exact same moment when my hands meet. After rendering I dont see any place where the tow cameras meet in syc. The result is this video:
The camcorder (on the left) seems to be in sync with the audio at first, the ndrifts away, but the webcam (right) is totally off right from the start. Notice how the webcam footage stops sometimes, while the camcorder footage is rolling. If you jump to the last seconds of this video - you can see how far off the webcam footage is in the end, still playing in the middle, while camcorder and audio are already finished with the performance.
The strange thing is that it's not the camcorder footage the one that seems to fail most - it's the webcam. Camcorder was a new thing . I edited webcam footage captured exactly the same way with success in Blender. An example with a ton of editing where the sync is solid:
Here's a screenshot of the session:
The issue appears unpredictably. Probably when different input video formats are used together. Only for movie strips, never for image sequences.
I know it's on when I sync Movie Strips in one point, I scrub to a different point and they don't match there.
I also know the issue is on, when I do some cuts, only to realize after rendering that my editing decisions were completely distorted and the cuts are not where I wanted them to be.
I know it's on when the original and aligned audio doesn't match up with the video, after 2 minutes into the footage.
Another sign of this problem happening is that 2 sources that started in sync don't end in sync - one is still "in the middle" while the other is over already.
But didn't you just forget to..?
I generate 25% and sometimes 50% JPEG proxy for all the footage. And most of the time I edit using that.
I enable Freerun timecode for all the footage also.
I have A/V Sync enabled at all times.
Testing the problem
So to finally find out what is happening I have captured 4-source footage of a 25-FPS Timecode counter going from 00:00:00:00 to 00:04:00:01
The sources are:
- Screencast - Open Broadcasting Software recordng my screen and an overlay webcam, with audio (it's just silence though- but helps check for the correct framerate)
- Camcorder - recording the laptop screen with audio
- Camera - recording the laptop screen with audio
- Phone - recording the laptop screen with audio
You can download all the video files here and do your own testing:
Here's ffprobe output for all the files (omitting the redundant heading information):
Input #0, matroska,webm, from 'Screencast.mkv': Metadata: ENCODER : Lavf56.40.101 Duration: 00:04:26.83, start: 0.000000, bitrate: 11868 kb/s Stream #0:0: Video: h264 (Constrained Baseline), yuv420p, 1920x1080, SAR 1:1 DAR 16:9, 30 fps, 30 tbr, 1k tbn, 60 tbc (default) Metadata: DURATION : 00:04:26.833000000 Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default) Metadata: title : simple_aac_recording DURATION : 00:04:26.795000000
Input #0, mpeg, from 'Camcorder.MPG': Duration: 00:04:27.84, start: 0.112556, bitrate: 9530 kb/s Stream #0:0[0x1e0]: Video: mpeg2video (Main), yuv420p(tv), 720x576 [SAR 64:45 DAR 16:9], max. 9100 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0:1[0x80]: Audio: ac3, 48000 Hz, stereo, fltp, 256 kb/s
[mjpeg @ 0x7256a0] Changeing bps to 8 Input #0, avi, from 'Camera.AVI': Metadata: encoder : maker : NIKON model : P80 creation_time : 2008-01-01 00:00:00 Duration: 00:04:25.50, start: 0.000000, bitrate: 8455 kb/s Stream #0:0: Video: mjpeg (MJPG / 0x47504A4D), yuvj422p(pc, bt470bg/unknown/unknown), 640x480, 8388 kb/s, 30 fps, 30 tbr, 30 tbn, 30 tbc Stream #0:1: Audio: pcm_u8 ( / 0x0001), 8000 Hz, 1 channels, u8, 64 kb/s
(I remuxed this video with ffmpeg to remove the GPS location tag: ffmpeg -i input.mp4 -metadata location="" -metadata location-eng="" -acodec copy -vcodec copy output.mp4)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Phone.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf56.40.101 Duration: 00:04:27.26, start: 0.000000, bitrate: 15173 kb/s Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 15056 kb/s, 26.57 fps, 30 tbr, 90k tbn, 180k tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 124 kb/s (default) Metadata: handler_name : SoundHandler
The issue initially expressed itself strongly.
I have synced the 4-camera footage, having all 4 sources match on timecode 00:00:00:10 (on frame 1 the laptop screen was obscured by my arm when I pressed the spacebar to start the timecode).
I tested with VLC and all video files contain the full footage from 00:00:00:00 to 00:04:00:01. However - the phone is never getting so far. In Blender it ends earlier. I wasn't able to correct this by importing the footage multiple times, even though VLC shows me footage that Blender never gets to. Strange.
Heres' the resulting video, edited and rendered with Blender 2.78:
Generally looks like the phone is lagging behind by about 20 seconds, the rest stays within the limits of 8 frames of desync in relation to the Blender's timecode:
I hope we can sort this out finally.