Blender performance when working with huge meshes can be improved.
Here are some ideas and reasoning.
- Selection: how much is wasted when rebuilding selection. Could we rearrange the VBOs to have less overhead?
- Profile with meshes of different sizes and common tasks. Add performance test cases for the common tasks.
- Move the display normals to draw module:
- After reevaluating the modifier stack the display normals are updated (depsgraph update). drawing code could take better decisions when doing it as part of the draw module.
- To calculate the display normals a reverse lookup structure is build. This structure isn't kept around. Performance could be improved when geometry doesn't change between recalc.
- Other buffers can also use this data (adjacency IBO for example).
- Does cycles use the display normals? If not we could eliminate it from the DNA/RNA
- Use data streaming optimized data structures in MeshRenderData (do not lookup polies inside a loop). Reduce cache misses by storing data in arrays and only allow sequential access.
- normals are precalculated, but uses additional memory that can lead to less performance (L2 caches) Check if calculating in inner loop speeds up.
- Split edit mode/object mode cache: currently the edit mode cache or object mode cache reuses the same memory location. When constructing the VBO/IBO the logic branches of. Would the code quality improve and also the innerloops. Expected tiny speedup. less branching between Mesh and Bmesh evaluation.
- Migrate to CPP and reduce branching by using template functions and classes.
- Can we use compute shaders to convert the MeshRenderData. This way we don't need to upload all the data from ram -> GPU.
- Need to research about the data transfer before and after such a change.
- The hair IBO is actually a simple formula. No need to do it on CPU.