Page MenuHome

GPU: Mesh Drawing Performance
Needs Triage, NormalPublicTO DO

Authored By
Jeroen Bakker (jbakker)
Mon, Apr 26, 4:48 PM
Tokens
"Love" token, awarded by gilberto_rodrigues."Love" token, awarded by ofuscado."Love" token, awarded by warcanin."Love" token, awarded by Stig."Love" token, awarded by Blazitron."Burninate" token, awarded by cgeugene."Love" token, awarded by franMarz."Love" token, awarded by mindinsomnia."Like" token, awarded by GeorgiaPacific."Like" token, awarded by Fracture128."Love" token, awarded by Senhor."Love" token, awarded by Alaska."Love" token, awarded by jora."Love" token, awarded by blueprintrandom.

Description

Blender performance when working with huge meshes can be improved.
Here are some ideas and reasoning.

Research Topics

  • Selection: how much is wasted when rebuilding selection. Could we rearrange the VBOs to have less overhead?
  • Profile with meshes of different sizes and common tasks. Add performance test cases for the common tasks.

Technical tasks

  • Move the display normals to draw module:
    • After reevaluating the modifier stack the display normals are updated (depsgraph update). drawing code could take better decisions when doing it as part of the draw module.
    • To calculate the display normals a reverse lookup structure is build. This structure isn't kept around. Performance could be improved when geometry doesn't change between recalc.
    • Other buffers can also use this data (adjacency IBO for example).
    • Does cycles use the display normals? If not we could eliminate it from the DNA/RNA
  • Use data streaming optimized data structures in MeshRenderData (do not lookup polies inside a loop). Reduce cache misses by storing data in arrays and only allow sequential access.
  • normals are precalculated, but uses additional memory that can lead to less performance (L2 caches) Check if calculating in inner loop speeds up.
  • Split edit mode/object mode cache: currently the edit mode cache or object mode cache reuses the same memory location. When constructing the VBO/IBO the logic branches of. Would the code quality improve and also the innerloops. Expected tiny speedup. less branching between Mesh and Bmesh evaluation.
  • Migrate to CPP and reduce branching by using template functions and classes.
  • Can we use compute shaders to convert the MeshRenderData. This way we don't need to upload all the data from ram -> GPU.
    • Need to research about the data transfer before and after such a change.
    • The hair IBO is actually a simple formula. No need to do it on CPU.

Revisions and Commits

Event Timeline

Jeroen Bakker (jbakker) changed the subtype of this task from "Report" to "To Do".Mon, Apr 26, 4:53 PM
Tjurig added a subscriber: Tjurig.Mon, Apr 26, 6:22 PM

From my own tests, uploading data to the GPU is currently the main bottleneck when transforming geometry, see: T88021.


Split edit mode/object mode cache

This seems worth doing early on, it should make code more maintainable.


  • Partial updates might be worth exploring:
    • Only update modified data-types: initially this could be limited to deforming vertices, changing selection & UV's.
    • Only update modified data: only send positions of vertices that are being transformed for e.g. This could increase code-complexity significantly, so it may not be worth doing early on.
  • The data layout could be optimized, I recall @Clément Foucault (fclem) mentioning we could avoid uploading vertex coordinates multiple times for e.g.

Hello, I don´t know if this is the right place for my statement but blender is massivly slow when trying to select an object in the viewport.
The more objects are in the scene and the more poligons they have it gets slower and slower. Our typical scenes have >10000 objects and >20million polygons. That is not extreme.
We use a very old 3D-program that was not updated since 2012 to select and shade all our objects in the scene, because in Blender it is not possible. In Blender you wait 5-10sec after clicking on an object in the viewport until it gets selected.
In the other packages selection is instantly. This Program can handle more than 100million polygons and instant selection with ease.
I hope this behavior could be solved when you now try to improve the mesh drawing performance!
Thanks

This comment was removed by Jeroen Bakker (jbakker).

Transforming verts.

When going to edit mode and transforming a single vert the next batches are recalculated:

ibo.tris
ibo.points
vbo.pos_nor
vbo.lnor
vbo.edit_data

ibo.tris sort the triangles by material. by looping twice once to count and the second time to assign. will not add hidden faces. implementation is single threaded
ibo.points single loop will not add hidden verts. implementation is single threaded.
vbo.edit_data updates flag (vert, edge, crease and weight.

Unchanged buffers are:

  • ibo.tris
  • ibo.points
  • vbo.edit_data

assuming high poly count num_tris * 3 *sizeof(int) + num_vert * sizeof(int) + num_vert * sizeof(int) are recalculated and resend. vbo.edit_data uses threading.

Callgraph when run single threaded