When extracting mesh data to GPU buffer each element used a callback
that handles a single element at a time. This reduces the posibilities
for compilers to optimize.
This change will change the handlers to process a range of elements at
the same invocation.
I wasn't able to measure any speed improvements on linux. Perhaps on windows it will.