Page MenuHome

Very Slow Workbench/Eevee Performance
Open, Confirmed, HighPublic

Description

System Information
Operating system: Windows-10-10.0.18362 64 Bits
Graphics card: AMD Radeon VII ATI Technologies Inc. 4.5.13558 Core Profile Context 26.20.11015.5009

Blender Version
Broken: version: 2.81 (sub 12), branch: master, commit date: 2019-10-01 19:51, hash: rB3b23685c7dc1
Worked: 2.8

Short description of error
Scenes with many objects have very bad performance / fps, compared to Blender 2.8!
My attached example scene is an Import from SketchUP and has around 1500 mesh objects and 250k faces. With regular 2.8 Version I get around 110fps in workbench and 33fps in eevee. With the newest 2.81 alpha build I get 53fps in workbench and 9fps!! in eevee!

Exact steps for others to reproduce the error
Open attached file and press play in workbench and eevee mode. Make sure, that overlays are on, to see the fps. And compare the file and fps with the regular 2.8 version

Event Timeline

I can confirm, the difference between 2.80 and latest master (measured with fraps cause blender's counter was waaaaaaaaaay to jumpy)

2.80master
solid120 fps88 fps
eevee73 fps57 fps
workbench118 fps87 fps
wireframe99 fps63 fps
Brecht Van Lommel (brecht) lowered the priority of this task from Needs Triage by Developer to Confirmed, High.Thu, Oct 3, 1:49 AM

one thing that may kill multiple birds with 1 stone is ATAA
we have artifacts and it's slow using current TAA.

after anything slowing the render down is taken care of this may be another step to get it faster.
scales much better than FXAA, and does not have the drawbacks of TAA.

https://thisistian.github.io/post/adaptive-temporal-antialiasing-without-rtx/

Ok so there is some overhead of the MultiDrawIndirect system (more noticeable on certain drivers) that can easily be avoided to fix this particular issue.

But investigating this I found out that current performance is not the one initially reported when committing rBce34a6b0d727bbde6ae373afa8ec6c42bc8980ce. This means something has slow down the rendering after this commit. Going to bisect.

LazyDodo (LazyDodo) reopened this task as Open.Sun, Oct 6, 2:41 AM

Retested this ticket, no measurable improvement with the fix committed by @Clément Foucault (fclem)

2.80oct 2oct5 4707f1982ddb
solid120 fps88 fps90 fps
eevee73 fps57 fps55 fps
workbench118 fps87 fps88 fps
wireframe99 fps63 fps62 fps

spend a few hours bisecting and tracked it down to rB1693a5efe91

@Sergey Sharybin (sergey) mind taking a peek here and see if anything can be done?

For me is already a big improvement with the fix from Clément, whit almost identical framerates in comparison to the stable 2.8.
And it depends on the scene: In one case with 20000 scattered plant instances the 2.81 alpha from oct 5 is near double the framerate in eevee. And in another test with 40000 scattered (with Jacques Lucke's scatter objects) subdivided suzanne's, 2.8 is a bit faster then 2.81

2.80oct 2oct5 4707f1982ddbD6002
solid120 fps88 fps90 fps92 fps
eevee73 fps57 fps55 fps57 fps
workbench118 fps87 fps88 fps94 fps
wireframe99 fps63 fps62 fps64 fps

D6002 is without a doubt an improvement, but nowhere near 2.80 numbers still, not sure how much time we should put into this though @Brecht Van Lommel (brecht) / @Sergey Sharybin (sergey) can probably better decide there. I have to admit I was mostly curious on why it slowed down, now that i know i'm perfectly happy with 'this is as good as it is gonna get without breaking other stuff'

@LazyDodo (LazyDodo), This is barely measurable difference. I've seen better improvement in my test, but we might be having different bottlenecks.

How do you measure? I've got environment variable __GL_SYNC_TO_VBLANK =0, opening fps-performance.blend file, switching to rendered viewport and hitting playback. For me the patch gave about 10%-20% improvement.

Might also be that it's not the only source of slowdown. What if you revert my changes in depsgraph by applying P1129?

Not sure how much time we should put into this though.

Depends on where the root of the slowdown is. If this is something what affects any scene even with real animation (my initial patch was solving speed regression happening in a fully static scene) we should solve that.

YAFU (YAFU) added a subscriber: YAFU (YAFU).EditedSun, Oct 6, 10:04 PM

@Sergey Sharybin (sergey) wrote:

SYNC_TO_VBLANK =0`

Hi. How does it influence Blender? Is it advisable to have it enabled or disabled from nvidia settings?.

I have Sync to VBlank activated in nvidia settings (Linux - GTX 960), and with default scene and custom frame rate=1000 I get a maximum of 60 fps in text overlay indication in viewport in Blender 2.79 and 2.8x. It is weird for me that it is almost fixed at 60 fps at 2.79 and 2.8x (red text anyway).
If I disable Sync to VBlank in nvidia settings and open Blender, I can get values of up to 450 fps in workbench (no more 60 fps limitation).
So if all this depends on GPU driver settings, then possibly we users are measuring fps erroneously in some cases.

EDIT:
Ok, I read that apparently Sync to VBlank is related to the frequency of the monitor, that's why 60 fps. I'm still not sure which is the best configuration in this regard for blender.

YAFU (YAFU) added a comment.EditedSun, Oct 6, 10:21 PM

With Sync to VBlank off in nvidia settings and fps-performance.blend scene (but custom frame rate=1000) I get:

*Workbench
Master: ~140 fps
Master Sergey patch: ~170fps
2.80 official release: ~190fps

*Eevee Rendered View
Master: ~70 fps
Master Sergey patch: ~70 fps
2.80 official release: ~62 fps

@YAFU (YAFU) for users Sync to VBlank is fine. There is no point redrawing the viewport more often than the monitor can display. Just for benchmarking it's something to be aware of.

Ok, thanks for the explanation Brecht.

LazyDodo (LazyDodo) added a comment.EditedSun, Oct 6, 11:46 PM

my measurements were with vsync off, taken with fraps with a blender release build on windows, picked the highest number that stayed long enough on screen for me to actually read it.

2.80oct 2oct5 4707f1982ddbD6002P1129
solid120 fps88 fps90 fps92 fps102 fps
eevee73 fps57 fps55 fps57 fps60 fps
workbench118 fps87 fps88 fps94 fps103 fps
wireframe99 fps63 fps62 fps64 fps68 fps

With P1129 applied, it does seem to suggest there may be a second commit involved causing the slowdown.

Started bisecting with a lite build rather than a full build hoping it would could down the time required, and the numbers didn't seem to line up at all. here's something interesting:

both builds are on the same commit a clean 4707f1982ddb (no diffs or patches)

fulllite
solid88 fps111 fps
eevee59 fps66 fps
workbench93 fps109 fps
wireframe62fps66 fps

Might be due to missing subdivision surface modifier in the lite build.

With P1129 applied, it does seem to suggest there may be a second commit involved causing the slowdown.

Indeed. However, i don't quite understand yet what is the difference between D6002 and P1129 for playback. Nothing obviously wrong is striking me from reading the code and stepping in a debugger. Second pair of eyes? Anyone? :)

@LazyDodo (LazyDodo), mind doing one more test? :) Apply P1130 on top of D6002 and see the numbers.
For me it showed extra speedup and profiler looks nicer as well.

master
D6002
D6002 + P1130

From dependency graph point of view, i think D6002 is definitely something we should have in (after reviewing by extra eyes).
P1130 gives nice results here, but is more platform and file specific. But kind of makes sense to NOT use threading to re-set single field of every element in an array.

Can't speak for other possible sources of slowdown though.

I've committed tweaks to the early output. So in theory after that the depsgraph performance should be same as 2.80.

If performance is still not the same for some setups there is another source of slowdown.

P.S. On top of the committed speedup i've created another patch D6017 to bring performance even higher from depsgraph/threading point of view, but that is a separate story.