Page MenuHome

Crappy Persistent mapping based gawain

Authored by Antony Riakiotakis (psy-fi) on May 13 2017, 2:28 AM.



By far not production ready code. Just to test if using persistent mapping solves a particular bottleneck on NVIDIA GPUs.

Seems to do the trick and push the bottleneck elsewhere

Diff Detail

rB Blender

Event Timeline

Good to see you @Antony Riakiotakis (psy-fi)! Thanks for digging into this & verifying that it does help.


a 400 MB buffer... did you mean to leave this in?


Not all supported systems implement ARB_buffer_storage so we'll have to check at runtime.


FYI, APPLE_LEGACY will be going away very soon since we just switched all platforms to 3.3 core profile.


I was worried that invalidating a buffer with static storage would introduce an implicit GPU-CPU fence, so I wanted to see how this would work if it would never map. I can try and see how it works with a limit too but my guess is that it will stall on invalidation.

Generally the trend in modern code (Vulkan too) is to use double buffering to avoid such stalls, but keeping track on memory size is a problem. One size does not fit all GPUs. I think the "easy" solution here is to just keep an array of buffer objects per frame and as soon as one is filled, allocate a new one. Think of it as an std::deque of buffer objects or so :). You can easily double or triple buffer these arrays (and/or recycle them) by tagging a "buffer swap" in gawain every actual SwapBuffers call.

There's also a hard solution that would imply keeping all data in CPU at some point allocate a buffer large enough to draw with, copy once and draw. But that would need extensive refactoring all over blender again.


not production code, this is just to see what I did and how it helped


I figured, that would be really good for code clarity.

Updating to latest driver fixed the delay here. This can surely be optimized better but it's not a blocker any more and it Works On My Machine (tm) now, so it will be up to you guys to improve this if you want.