Page MenuHome

DRW: Refactor to support draw call batching
Needs ReviewPublic

Authored by Clément Foucault (fclem) on Jun 3 2019, 1:24 PM.

Details

Summary

This refactor improve draw structures CPU/Memory efficiency and lower the
driver overhead of doing many drawcalls.

  • Model Matrix is now part of big UBOs that contain 1024 matrices.
  • Object Infos follow the same improvement.
  • Matrices are indexed by gl_BaseInstanceARB or a fallback uniform.
  • All these resources are using a single 32bit identifier (DRWResourceHandle).
  • DRWUniform & DRWCall are alloced in chunks to improve cache coherence & memory usage.
  • DRWUniform now support up to vec4_copy.
  • Draw calls are allocated in chunks of 128 calls.
  • Draw calls are sorted by GPUBatches inside a chunk, to improve batching process.
  • Draw calls are batched together if their resource id are consecutive.
  • Draw calls are now batched into command lists to leverage Multi Draw Indirect which boosts performance significantly.

This has a great impact on CPU usage when using lots of instances. Even if the biggest
bottleneck in these situations is the depsgraph iteration, the driver overhead when doing
thousands of drawcalls is still high.

This only improve situations where the CPU is the bottleneck: small geometry, lots of
instances.

The next step is to sort the drawcall inside a DRWCallChunk to improve the batching process
when instancing order is pretty random.
Done

Test scenes:

Diff Detail

Repository
rB Blender
Branch
tmp-drw-callbatching (branched from master)
Build Status
Buildable 3900
Build 3900: arc lint + arc unit

Event Timeline

  • Object Mode: Outlines: Rewrite id pass generation
  • DRW: Add builtin uniform to get full DRWResourceHandle from shader
  • Object Mode: Add back lightprobe selection outlines
  • DRW: Use int instead of uint for DRWCall
  • DRW: Remove common_view_lib uniform default values
  • Edit Curve: Fix curve normals
  • Cleanup: GPUBatch: rename arguments
  • DRW: Make workaround for drivers with broken gl_InstanceID
  • Workbench: Use resource_id instead of own index
  • Workbench: Remove object_id and optimize material hash generation
  • DRW: Add draw call sorting
  • GPU: Add API to use multidrawindirect using GPUbatch
  • DRW: Use new GPUDrawList to speedup instancing
  • Workbench: Simplify / Speedup Material Hash