This refactor improve draw structures CPU/Memory efficiency and lower the
driver overhead of doing many drawcalls.
- Model Matrix is now part of big UBOs that contain 1024 matrices.
- Object Infos follow the same improvement.
- Matrices are indexed by gl_BaseInstanceARB or a fallback uniform.
- All these resources are using a single 32bit identifier (DRWResourceHandle).
- DRWUniform & DRWCall are alloced in chunks to improve cache coherence & memory usage.
- DRWUniform now support up to vec4_copy.
- Draw calls are allocated in chunks of 128 calls.
- Draw calls are sorted by GPUBatches inside a chunk, to improve batching process.
- Draw calls are batched together if their resource id are consecutive.
- Draw calls are now batched into command lists to leverage Multi Draw Indirect which boosts performance significantly.
This has a great impact on CPU usage when using lots of instances. Even if the biggest
bottleneck in these situations is the depsgraph iteration, the driver overhead when doing
thousands of drawcalls is still high.
This only improve situations where the CPU is the bottleneck: small geometry, lots of
The next step is to sort the drawcall inside a DRWCallChunk to improve the batching process Done
when instancing order is pretty random.