Page MenuHome
Paste P608

Gawain VAO manager
ActivePublic

Authored by Clément Foucault (fclem) on Feb 10 2018, 4:00 PM.
/**
* Gawain VAO management is not ideal.
*
* Currently to draw a batch with 2 different materials, all bindings must be
* reset before each draw call. That is not how VAOs are supposed to be used.
* VAOs are supposed to be constant after initialization and just bound prior
* to the drawcall. This way the driver can do optimisation.
*
* Germano (mano-wii) did some test and replacing all VAOs by one VAO is even
* giving performance improvement over the current solution.
*
* So to aleviate this problem, we need to have one VAO per "drawing state"
* (expression of my own). Drawing state is defined by VBOs Attribs and ShaderInterface.
* Since we also want to support instancing, we need to add that into consideration.
* Instancing is done by providing a batch in which all vertices are treated as instances.
*
* Since VAOs are local to the current context they are created in, we need to keep them
* separated per contexts and perform deletion in the right context. The current system is
* also not taking that into account and assume main thread = main and only ogl context.
*
* So I came up with this solution. We use a hash table that is connected to a wide array of
* VAOs identifier. Each VAO identifier [VaoInfo] contains the full "drawing state" in case of
* hash collision.
*
* Since we can have 3 (maybe 4) identification keys that can potentially have very different
* usage frequency, (IMO) we cannot have a multi dimentional hash without waisting a lot of space
* and cache usage. So combining the keys into one key and using a 1D hash table is what I suggest.
*
* This brings us to the problem of VAO ownership. Since the VAO is dependant on the "drawing state",
* any modification will make the VAO obsolete. While we could just reuse orphaned VAOs, it's better
* to let the driver delete them until they are reused. So we need to delete the VAOs when one of its
* keys get deleted (VBOs formats and ShaderInterface are immutable). Unfortunatly the hashing system
* prevents us from getting directly the VAO linked to a single key. This is why we need to linear
* search the whole VAOs identifier array. To speed this up, we keep this array tighly packed (filling
* the holes created by deleted VAOs) and in one memory block (using realloc).
*
* A Shader or a Batch can be freed by some other threads and in another context so freeing a VAO
* must be delayed until the VAO owning context is bound. Shaders and batch ca be shared across
* contexts too so take care of this.
*
* Since adding and removing VAOs will have more overhead in this system,
* we need to strive for reusing them. This means NOT generating new VBOs
* each frames (otherwise lots of VAOs can potentially be created). The vertex
* buffer API needs to support VBO updating to minimize the need to create batches.
* This can be done in another commit.
*
* Furthermore, since VAOs will be static, we need a way to support Matrix attributes
* in the vertex format so that we don't use GWN_batch_draw_stupid_instanced.
*
* This will also fix multi context issue we are having.
*
* In addition, we can still couple this solution with the separate DRW offscreen gl context
* to avoid major VAOs duplication since all viewports' VAOs will belong to the DRW gl context.
*
* Two final notes:
* - The total overhead of this might be a few MB of RAM. I can not assure
* frame time will be lower but I think CPU usage will get lower. It largely depends on if the
* hashing function + vao id retreival cost being lower than the cost of binding and reseting
* all attribs over and over.
* EDIT: It WILL be faster: Now we do a hash lookup for each attrib. This patch will only
* do one lookup per drawcall.
* - Parts of the code that will need some more work is mainly in the drawmanager itself.
* Maybe drawcache will have some really small overhead for recreating the batches & VBOs
* for animated models (which could be avoided if Gwn_VertBuf allowed reuploading of data).
*
* DISCLAIMER: this code is a proof of concept: Naming and style is not final.
**/
#define POOL_CHUNK_SIZE 1024
struct VaoInfo {
/* Needs to be doubly linked list. :( */
unsigned next, prev; /* offset in the pool. */
/* Identification */
struct Batch *batch;
struct Batch *instancer_batch;
struct ShaderInterface *interface;
/* Can add Opensubdiv? */
GLuint vao_id;
} VaoInfo;
struct Context {
struct Context *next, *prev;
void *context_handle; /* To identify context */
bool is_active; /* Debug: to see if 2 threads try to activate the same ctx. */
unsigned hash[1000]; /* Big enough for standard scenes. Contains offset in VaoInfo array. */
std::vector<GLuint> pruned_vaos;
unsigned pool_size;
unsigned pool_cursor; /* offset of the next empty spot */
struct VaoInfo vaos[0]; /* Variable size array. */
} Context;
static Context *ctx_first; /* linked list */
/* Each thread can have an active context */
per_thread static Context *act_ctx;
/* Same thing with Batches */
void remove_shader_interface(ShaderInterface *interface)
{
/* Can be called from different threads. */
mutex.lock();
for (Context *ctx = ctx_first; ctx; ctx = ctx->next) {
/* Perform linear search. */
VaoInfo *vao = ctx->vaos;
for (int i = 0; i < ctx->pool_size; ++i, ++vao) {
if (vao->interface == interface) {
ctx->pruned_vaos.push(vao->vao_id);
/* Fix Linked list */
if (vao->prev)
ctx->vaos[vao->prev].next = vao->next;
if (vao->next)
ctx->vaos[vao->next].prev = vao->prev;
vao->next = 0;
vao->prev = 0;
/* Tag the entry as empty. */
vao->vao_id = 0;
/* Reset Cursor to first empty entry. */
if (i < ctx->pool_cursor)
ctx->pool_cursor = i;
}
}
}
mutex.unlock();
}
/* Called before unbinding a context. */
void delete_pruned_vaos()
{
mutex.lock(); /* Assure no other thread adds new pruned vaos during this process. */
/* Can only clear vaos from this context */
if (!act_ctx->pruned_vaos.empty()) {
glDeleteVertexArrays(act_ctx->pruned_vaos.size(), act_ctx->pruned_vaos.data());
act_ctx->pruned_vaos.clear();
}
mutex.unlock();
}
GLuint get_vao(Batch *batch, Batch *instancer_batch, ShaderInterface *interface)
{
unsigned *target; /* Where to insert new VaoInfo offset. Might be a bit convoluted... */
unsigned hash_id = compute_hash_id(batch, instancer_batch, interface);
if (act_ctx->hash[hash_id] == 0) {
/* Will add new vao directly in the hash table. */
target = &act_ctx->hash[hash_id];
}
else {
/* Will add new vao at the end of the linked list. */
unsigned next = act_ctx->hash[hash_id]
/* Further test if colision. */
do {
VaoInfo *vao = &act_ctx->vaos[next];
if (vao->batch == batch &&
vao->instancer_batch == instancer_batch &&
vao->interface == interface)
{
return vao->vao_id;
}
target = &vao->next;
next = vao->next;
} while (next);
}
*target = new_vao(Batch *batch, Batch *instancer_batch, ShaderInterface *interface);
return act_ctx->vaos[*target].vao_id;
}
unsigned new_vao(Batch *batch, Batch *instancer_batch, ShaderInterface *interface)
{
/* Cursor is already on a free VaoInfo */
unsigned ofs = act_ctx->pool_cursor;
VaoInfo *vao = &act_ctx->vaos[ofs];
vao->batch = batch;
vao->instancer_batch = instancer_batch;
vao->interface = interface;
glGenBuffers(1, &vao->vao_id);
setup_vao(vao); /* Similar to Batch_update_program_bindings() */
/* Find next empty VaoInfo. */
while (act_ctx->vaos[act_ctx->pool_cursor].vao_id) {
act_ctx->pool_cursor++;
#ifdef TRUST_NO_ONE
assert(act_ctx->pool_cursor <= act_ctx->pool_size);
#endif
if (act_ctx->pool_cursor == act_ctx->pool_size)
break;
}
if (act_ctx->pool_cursor == act_ctx->pool_size) {
/* Resize the context pool. */
act_ctx->pool_size += POOL_CHUNK_SIZE;
act_ctx = realloc(act_ctx, sizeof(act_ctx) + sizeof(VaoInfo) * act_ctx->pool_size);
/* Init new chunck. */
memset(act_ctx->vaos[act_ctx->pool_cursor], 0, sizeof(VaoInfo) * POOL_CHUNK_SIZE);
/* Update linked list. */
if (act_ctx->prev)
act_ctx->prev->next = act_ctx;
if (act_ctx->next)
act_ctx->next->prev = act_ctx;
}
return ofs;
}