Page MenuHome

OpenGL/Nvidia Performance on windows
Open, Confirmed, LowPublic


While profiling the openXR branch where even the default cube only runs at a disappointing 45 fps i stumbled upon drw_engine_init running every frame and doing some rather expensive operations every single frame which is 'not great'

we recently removed one of those operations from the GP init but there's still plenty of others left

A quick profile shows the following flame graph

Both GPU_Draw_List_Init and GPU_Buf_Alloc seem to trigger behavior in the nvidia driver that causes it to spin off a thread and wait for it to exit, not something you'd want to do every frame (or multiple times a frame, hard to tell from a flame graph)

Discussed on chat and you requested a task low prio task for this for further investigation

CC : @Clément Foucault (fclem) , @Brecht Van Lommel (brecht)


Differential Revisions
D6087: Workbench: Performance

Event Timeline

LazyDodo (LazyDodo) lowered the priority of this task from Needs Triage by Developer to Confirmed, Low.Sep 22 2019, 9:19 PM
LazyDodo (LazyDodo) created this task.

The workbench creates a UBO for AO samples And world data.

Samples UBO

Dont think it is related to the TAA/AO samples. These are being guarded, but might not that is not cached when multiple viewports with different settings are active. If that is the case the second view will re-validate the cache created by the first view and vise versa.$548-567

World UBO

The world UBO is generated on every draw call. It is not guarded. It is 304 bytes. This data is best cacheable per viewport and not globally. Otherwise it there is a slight chance of a cache hit between viewports.

Draw list

The draw list can be huge (mesh data) so that is not something I want to mess with.


I would expect we can cache the World UBO per viewport

I guess this is hitting a driver slow path.
Draw list : We should maybe use SSBO + Persistent mapping instead of creating / freeing potentially lots of UBOs.
Workbench : All UBOs are rather small. I agree they should use update instead of being re-specified every time if it does not add too much complexity.