Page MenuHome

Workbench animation renders are extremely slow (2.79: <25m, 2.80: 6-8 hours)
Open, Confirmed, HighPublic

Description

System Information
Operating system: Linux-4.18.0-25-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: GeForce GTX 1060 6GB/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 390.116

Blender Version
Broken: version: 2.81 (sub 12), branch: master, commit date: 2019-09-22 20:38, hash: rB52bdf522afcd
Worked: (optional)

Short description of error

Workbench animation renders are inexplicably slow.

Exact steps for others to reproduce the error

1. Open attached file in blender 2.79 and run OpenGL Render Animation;
Note that the total rendering duration is around 6.32 seconds.

2. Now do the same but opening the file in Blender 2.81 (View -> Viewport Render Animation);
Rendering time is now 22.80 seconds.

Details

Type
Bug

Event Timeline

I can confirm.
For the purity of the experiment, I set 8 passes for both rendering and viewport, and also turned off the simplify:

Performance problems are known and are being investigated case by case (https://developer.blender.org/project/profile/103/).
Although test scenes are useful, they need to be simple and cover only one problem at a time.

The report has not made it clear whether the problem is with animation, viewport or rendering.

Try to describe in a few steps how to reproduce the problem (showing for example where we should look).

After try the test scene that user provide the problem is clear. In 2.79 when you render the same scene at 4k with OpenGL it is near to realtime But when you render with 2.80 and workbench, that must to be the replace to OpenGL render, at same resolution it go really slow. Also the editor doesn't respond to user interact at same speed like 2.79 render.

But I think that is a problem of the render system itself. You can pick a scene with no objects, only a camera, do a workbench render at 4k and time to made any frame will be 0,27 seconds (in my system). This same scene in 2.79 needs only a few miliseconds.

Changing the format, video or image, or it is 8bit, 16bits, OpenXR, PNG,... not change the times, so the problem must be a problem of render system, it cant be a problem with editor because if you render without any image editor in the screen or viewport the problem is the same.

Edit: The problem is correlated with resolution, because the time to render change with resolution at linear factor. So maybe workbench is extremely low compared with the old OpenGL? It can't be only that because workbench in viewport is faster than in the render animation.

Thanks Germano -- I hope you have some time to take a closer look at this -- IMO this is a significant regression from 2.79. After looking further, I think it has less to do with the render engine choice and configuration, and mostly to do with between-frame time, specifically the writing of the output files. My example project provided in the original report is already quite simple: a few Suzannes rotating. But here is an even simpler example project:

That project has no objects at all. Project is 60fps. 250 frames. Playback in a ~3840x2160 viewport in "rendered" mode is perfect (obviously). Render settings: no anti-aliasing, no PNG compression, rendering to RAM disk.

Results (these are actual per-frame times, not the reported rendering time which doesn't include between-frame time):

regular blender:

1920x1080: 0.26s/frame (16X slower than viewport)
3840x2160: 1.26s/frame (76X slower than viewport)
7680x4320: 4.92s/frame (296X slower than 4K viewport)

rendering no-GUI via the CLI (--render-anim -b):

1920x1080: 0.24s/frame (render .08, save time .17)
3840x2160: 0.97s/frame (render .32, save time .65)
7680x4320: 3.78s/frame (render1.24, save time 2.54)

If I change to the eevee renderer (1 sample, no denoising), the same kind of pattern happens.

So it looks like the biggest issue is the very slow saving of the files to the RAM disk, or at least what the CLI output is reporting as the time for "Saving", whatever that precisely refers to. (Times to SSD are 1.4X slower even than that.) The actual render time itself is pretty slow, too, >=20X slower than playback in an equivalent viewport for the 4k CLI render.

Steps to reproduce:

1 - start blender with blank configs
2 - open demo project (seconddemo.blend)
3 - observe that the project plays smoothly (of course) at 60 fps
4 - observe that rendering the 4.17s animation takes much longer (66 seconds on my machine)
5 - consider that the analogous project in 2.79 did not have this problem

My latest project is a 17 minute, very simple 4K motion graphic project that plays rendered in real time without even making the computer warm: some text, a few images on planes, some lines, etc. Rendering PNGs to an SSD took 15.5 hours. This per-frame overhead probably isn't a problem if your renders are taking 4 minutes per frame, but if you are rendering simple stuff (i.e. motion graphics) it's very painful.

And again, this wasn't a problem in 2.79. 2.79 maybe wasn't as fast as it could have been, and I know the rendering engine was different, but it rendered the same kinds of projects at <.1s per frame instead of 1.26s per frame.

I note that changing Properties -> Render Properties -> Color Management -> View Transform from "Filmic" to either "Standard" or "Raw" results in a CLI-reported "Saving" time dropping from .65 to .42 seconds per frame at 4k. A significant speedup. (Nowhere near fast enough, but still: interesting.)

I don't know if other color space related processing might have something to do with these problems, but it might be part of an explanation for why the OpenGL rendering was so much faster?

Germano Cavalcante (mano-wii) lowered the priority of this task from Needs Triage by Developer to Confirmed, High.
Germano Cavalcante (mano-wii) updated the task description. (Show Details)

Ok I can confirm as a regression.
@Casey Connor (clepsydrae), I hope you don't mind, but I edited the report description.
@Jeroen Bakker (jbakker), this involves workbench so it's your area as well.

Thanks Germano! I will note that I don't know if it's necessarily a workbench issue (see 3:23pm comment: seems to be the same thing going on with eevee, partially related to color space transform, etc.)

This is not a 2.81 target as far as I know, whatever fix this needs probably requires bigger changes that would be unsafe to make there.

@Casey Connor (clepsydrae)

HAve you tried the option inside viewport->view>vierport render animation

@Alberto Velázquez (dcvertice) -- thanks for the reminder of that -- that is faster than regular rendering (roughly .14s/frame for the demo project in this comment), but still significantly slower than the viewport (.017s/frame)

To anyone finding this: there were a couple questions about this during the BCON developer Q&A -- it sounds like the need to copy data from the GPU back to the CPU (new in 2.80 I guess?) for the sake of compositing and color management is part of the explanation. At any rate, it's nice to know that the developers are aware of it and so hopefully some progress can be made.

Just to clarify in more depth :-)

  • The copy back to CPU is slower due to the flooding of the Draw API and the more data in the drawn texture (RGB8 vs RGBA16F). On my machine this process takes 0.04s per frame. Blender 2.79 used a tripple buffer that improves the download times and didn't flood the draw api. We could investigate in PBO to do async downloads.
  • Blender 2.79 always rendered in display space, blender 2.80 renders in scene reference space. We are using OpenColorIO 1.1.0 what has more precision on the CPU than the GPU, hence for final rendering we use the CPU for color management. OpenColorIO 2.0 is in development already for some years that supports higher quality on the GPU. We could add some options here for example to use the GPU color transfers when doing viewport rendering.
  • One I haven't investigated yet, as they haven't have that much of impact is that the draw caches are cleared when starting a viewport render. These caches needs are flushed for every frame.

When drawing in the viewport we don't need to transfer back to the CPU and we use the lower quality GPU transfer functions

Thank you, Jeroen!

Should the bug report title be changed to remove the "Workbench" association? Are these issues present regardless of the rendering engine?

It would be great if there was a little collection of check boxes that could enable/disable/change certain aspects of the rendering process that would optimize for minimal per-frame overhead, to aid those of us doing renders that have very short "actual" render times (e.g. 2D motion graphics, and I suppose people doing grease pencil work might appreciate this too? I have no experience there...) E.g. if I'm doing no compositing, no color management, and using the Workbench renderer and only need some simple antialiasing, maybe there can be some shortcuts taken.

Anyway, I know we're in good hands :-). Thanks for any time you all have to look at it.

@Jeroen Bakker (jbakker) i remember a while ago making a dirty hack to speed up ocio processing tiny bit, our implementation tends to favor doing the correction per pixel rather than batching the whole image if an alpha channel is present. Leading to a significant call overhead. While appropriate if the alpha channel is actually used, if it's a solid alpha batching would have been quicker.

diff
diff --git a/intern/opencolorio/ocio_impl.cc b/intern/opencolorio/ocio_impl.cc
index b838f0e979f..34ad1c635a8 100644
--- a/intern/opencolorio/ocio_impl.cc
+++ b/intern/opencolorio/ocio_impl.cc
@@ -659,6 +659,24 @@ void OCIOImpl::processorApply(OCIO_ConstProcessorRcPtr *processor, OCIO_PackedIm
   }
 }
 
+bool solidAlpha(OCIO_PackedImageDesc *img_)
+{
+  PackedImageDesc *img = (PackedImageDesc *)img_;
+  float *pixels = img->getData();
+
+  int width = img->getWidth();
+  int height = img->getHeight();
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      float *pixel = pixels + 4 * (y * width + x);
+      if (!(pixel[3] == 1.0f || pixel[3] == 0.0f))
+        return false;
+    }
+  }
+  return true;
+}
+
 void OCIOImpl::processorApply_predivide(OCIO_ConstProcessorRcPtr *processor,
                                         OCIO_PackedImageDesc *img_)
 {
@@ -666,7 +684,7 @@ void OCIOImpl::processorApply_predivide(OCIO_ConstProcessorRcPtr *processor,
     PackedImageDesc *img = (PackedImageDesc *)img_;
     int channels = img->getNumChannels();
 
-    if (channels == 4) {
+    if (channels == 4 && !solidAlpha(img_)) {
       float *pixels = img->getData();
 
       int width = img->getWidth();

It felt like too much of a dirty hack and the savings were too minimal (but there) so i never submitted it, but it does seem like it is some low hanging fruit we could look at:

savetime in seconds1920x10803840x21607680x4320
Before0.571.113.15
After0.52 (-8%)0.90 -(18%)2.36 (-25%)

It is not better allow a RGB8 to make faster renders? The render that user ask is not important the quality, only needs a preview.