Page MenuHome

ppc64el, Debian: video sequence editor (VSE) preview corrupt
Needs Developer to Reproduce, NormalPublic

Description

System Information
Operating system: Debian buster
Graphics card: Radeon RX 580

Blender Version
Broken: tried the packages from Debian buster, buster-backports and manually building the package from Debian unstable 2.83.5+dfsg-2
Worked: N/A

Short description of error
Blender appears to work. In the video sequence editor, preview window, video clips have vertical lines through the picture. Images (e.g. from JPEG files) are displayed correctly, only video clips are broken. I saw other bugs such as T19794 about endian problems. I'm testing on POWER9, a little endian PPC64 environment.

Exact steps for others to reproduce the error

  1. Start blender from the command line with the test Blend file as an argument
  2. Look at the preview pane: it is corrupted, vertical bars through the picture

Event Timeline

Sounds like a video driver problem.

Note that I'm not sure any active developers have access to this hardware, even so.

This report is missing:

  • Exact steps to redo the error.
  • A blend file (unless there are simple steps to redo this from the default startup).

Since this is an error others may have trouble redoing:

  • Does this happen for all video?
  • Does this happen for image sequences?
  • Can you test different codecs/formats to narrow down where the problem might be?

Yes, I'm happy to test it and create a sample blend file demonstrating the problem

I notice it happens to the videos in the sequence but the images are rendered properly

I'm a developer, I usually work in other C++ projects, for example reSIProcate but if you can tell me where to focus, I don't mind looking at the code and applying any changes

The workstation I'm using is one of the Talos II systems, it is open hardware.

I can try a different GPU if there are known problems with the RX 580. I prefer to use only the open source amdgpu driver. I was thinking about getting one of the AMD Big Navi cards in November but it is not clear if they will work immediately on Linux systems.

I tried building 2.90.0 as a Debian package, it also has the same bug.

I attach a screenshot showing how the video looks in the preview

Notice the PNG at the bottom with the text is displayed correctly but the video is rendered with vertical bars

Richard Antalik (ISS) changed the task status from Needs Triage to Needs Information from User.Sep 18 2020, 4:36 PM

This doesn't look like GPU issue, since whole preview region is pushed to GPU as one texture.
Also I am not sure if this would be endianness issue, since x86 is also little endian.

Does it look like this if you add movie strip and don't adjust any strip properties? Can you try to load movie into other editors like movie clip editor or image editor?

As an endianness issue, I was concerned that some part of the code may see that it is running on ppc and wrongly assume all ppc == big endian. I've seen similar problems in other code.

ppc can be either little or big endian and some developers assume one implies the other.

Regarding endianness it seems we rely on __BIG_ENDIAN__ and __LITTLE_ENDIAN__ symbols to be defined correctly, but they are not defined in GCC. I don't build on Linux so I don't know if this has to be defined explicitly somewhere in make files. @Ray molenkamp (LazyDodo) Are you familiar with this?

I can see that little endian is detected at the configure step and the gcc compiler is called with -DLITTLE_ENDIAN on the command line.

However, maybe there could be one accidental block of code like this that nobody checked before:

#ifdef __ppc...
// wrongly assume big endian

Here is a sample blend file that reproduces the problem

There is a single video strip

This was created in Blender 2.90.0 on Debian ppc64el

Another problem I sometimes see when porting code is that the size of data types is not correct for every architecture, for example, the gcc on that architecture uses 64 bits for a type but the developer has guessed it is 32 bits and hardcoded that somewhere. I feel that in this case, that is less likely than an endianess issue but can't be completely sure.

Regarding endianness it seems we rely on __BIG_ENDIAN__ and __LITTLE_ENDIAN__ symbols to be defined correctly, but they are not defined in GCC. I don't build on Linux so I don't know if this has to be defined explicitly somewhere in make files. @Ray molenkamp (LazyDodo) Are you familiar with this?

This is controlled through the main cmakelists.txt in this section and should be defined when the compiler is invoked.

Using find, I found three places in the source tree where the logic is not right for ppc64el

btSerializer.h

SDL_endian.h is good for linux but not for BSD and others perhaps

PacketMath.h has an example of assuming the size of a type of ppc64el

Here is the command I used, other searches for endian and byte_order might also be useful:

find blender/ -type f -exec egrep -C10 -Hi '(_ppc|_power)' {} \;|less

Are any of those things listed above relevant to the preview panel in the video editor?

none of those are used in the VSE, for all i know ffmpeg is delivering the frame to us like this, a developer will have to look at this, throwing random things at the wall going "could it be this?" doesn't seem like a good use of anyone's time.

Can you check if other software using ffmpeg plays this properly? (video editors, players)

Note that this report is still missing:

  • Test files to redo the error.
  • Details about which codecs work and which fail, please try with different video formats to see if this is related to a spesific codec.

I provided a test file, please see the earlier comment "Here is a sample blend file that reproduces the problem", it is a link to the files on Gitlab.

I ran ffplay from the command line with each of the video files used in the Blend file.

ffplay can display them without any problems.

I tried using an updated ffmpeg 4.3.1 backport, that didn't make any difference

I provided a test file, please see the earlier comment "Here is a sample blend file that reproduces the problem", it is a link to the files on Gitlab.

Missed the link, uploaded here.

Looking at the preview window with different zoom settings changes the way it is corrupted, notice the gaps between bars is zoomed

I tried some videos with other codecs. The cat video attached is H.264, the others I tried are VP9 and WMV2. All have the same problem so it is not specific to the H.264 codec.

I continued testing today and I found that the same problem appears in the render output. I tried each render engine (Eevee, Cycles, Workbench) and it is always the same.

I tried using the Compositor as well and the video node in the compositor has the same problem.

Richard Antalik (ISS) changed the task status from Needs Information from User to Needs Developer to Reproduce.Mon, Sep 28, 5:35 AM

I am really not sure what the problem here could be.

All info here would point to one switch, which I would assume to work correctly. To check you can apply following patch, load video in VSE and check console output:

diff --git a/source/blender/imbuf/intern/anim_movie.c b/source/blender/imbuf/intern/anim_movie.c
index f5ae602946e..a05f98330bd 100644
--- a/source/blender/imbuf/intern/anim_movie.c
+++ b/source/blender/imbuf/intern/anim_movie.c
@@ -801,6 +801,7 @@ static void ffmpeg_postprocess(struct anim *anim)
   }

   if (ENDIAN_ORDER == B_ENDIAN) {
+    printf("Detected big endian machine\n");
     int *dstStride = anim->pFrameRGB->linesize;
     uint8_t **dst = anim->pFrameRGB->data;
     const int dstStride2[4] = {dstStride[0], 0, 0, 0};
@@ -847,6 +848,7 @@ static void ffmpeg_postprocess(struct anim *anim)
     }
   }
   else {
+    printf("Detected little endian machine\n");
     int *dstStride = anim->pFrameRGB->linesize;
     uint8_t **dst = anim->pFrameRGB->data;
     const int dstStride2[4] = {-dstStride[0], 0, 0, 0};

I can't see any other code directly manipulating pixels there.

I had already checked that code with gdb breakpoints. It always takes the second code path.

853	    uint8_t *dst2[4] = {dst[0] + (anim->y - 1) * dstStride[0], 0, 0, 0};
(gdb) 
855	    sws_scale(anim->img_convert_ctx,
(gdb) 
864	  if (need_aligned_ffmpeg_buffer(anim)) {
(gdb) 
874	  if (filter_y) {
(gdb) 
ffmpeg_fetchibuf (tc=<optimized out>, position=<optimized out>, anim=<optimized out>) at ./source/blender/imbuf/intern/anim_movie.c:1235
1235	  anim->last_pts = anim->next_pts;
(gdb)

I have debug symbol packages installed but some values are optimized out, I can recompile without optimizing if necessary

Could it be related to alignment or sizeof(some type) on this platform?

when I zoom in a lot so the pixels become quite big, I can see that the ratio between the vertical strips of image to the vertical strips of alpha is 1:15, so there is one vertical line of pixels in every 16 pixels

not sure if this is relevant, but there could be something around the call to sws_scale, for example
https://code.mythtv.org/trac/ticket/12888

The hardcoded 32 bit alignment also appears to be something to avoid, I found commits like this in other projects
https://github.com/FFmpeg/FFmpeg/commit/f30a41a6086eb8c10f66090739a2a4f8491c3c7a

It is a magic number, 32 in the call to av_frame_get_buffer and a few other places, the number 31 also appears in need_aligned_ffmpeg_buffer. I'm not sure how prevalent this is throughout Blender but it may be a good idea to

a) make a commit replacing all the magic numbers with a macro

b) make another commit replacing the macro with 0 in cases where alignment can be guessed by av_frame_get_buffer

I guess changing hard coded alignment to automatic would be good practice, but I don't think this is cause of the issue, because buffer is actually 32 bit aligned. In case with example video av_frame_get_buffer(anim->pFrameRGB, 32) should not be executed anyway.

Since images works OK and also other editors have same issue, bug must come from anim_movie.c file and there isn't much going on - basically sws_scale fills the buffer and it is passed to editor. You say that ffplay works correctly however. Is Blender actually build with that ffmpeg library you tested?

I could add more debug prints there(mainly check result of av_frame_alloc) and "force" sws_scale to pass image unchanged to see what happens if you have patience, but this is mostly guesswork at my part.

I'm happy to tweak things

I do quite a lot of C++ in other projects, for example:
https://github.com/resiprocate/resiprocate/graphs/contributors

but I'm not familiar with the Blender code base.

If you want me to query values in gdb breakpoints, test changes that you put on a branch or if you want to create some extra unit tests for me to run then I'm happy to collaborate with you until we get this fixed.

I checked with ldd and I can see that Blender links against the ffmpeg dynamic libs on my system

I tried first with the ffmpeg 4.1.6 packages from Debian stable (buster), then I built a backport package of ffmpeg 4.3.1 and I recompiled against that

https://packages.qa.debian.org/f/ffmpeg.html