BMesh Refactor #91329

Open
opened 2021-09-11 05:58:28 +02:00 by Joseph Eagar · 27 comments
Member

BMesh has served Blender well for many years, but it as a few problems: the memory overhead is high, and the performance isn't always great compared to pure Mesh arrays.

The Problem

BMesh is a BREP (a variant of the classic half-edge data structure). The problem with BREPs is they do not always work well with CPU caches, especially on 64-bit platforms where all of the helper pointers can overwhelm the cache. When Geoffry Bantle designed BMesh he tried to improve cache coherency by writing the mempool library, but this did nothing to solve the problem of huge numbers of 8-byte helper pointers overwhelming the cache altogether (at the time Blender was still 32-bits).

I recently had a bunch of users on Blenderartists run a series of benchmarks for a simple vertex smooth operation. To summarize the results (which will be discussed in more depth later) we can get a pretty decent performance boost (~42%) if we replace all helper pointers in BMesh with 4-byte integer indices.

This makes sense. BMesh is memory bound (profiling consistently shows this), so replacing 8-byte pointers with 4-byte ints will reduce the load on the CPU cache and DRAM by quite a bit.

Short Intro To BMesh

BMesh has the following structure:

image.png

  1. Vertices, which store one pointer to an edge.
  2. Edges, which store six pointers to vertices and one to loops (two for the edges plus the four used for each vertex's disk cycle, see blue lines).
  3. Loops, which have 1 pointers each pointing at a vertex, edge, and a face, along with four more pointers to other loops. Like edges loops participate in two linked lists: the face they belong to, and also the radial cycle (see red line) of loops surrounding an edge.

The Benchmark

To investigate where we should take BMesh in the future I wrote a series of benchmarks and
had users on BlenderArtists run them . There are four benchmarks in total, each of which run a simple vertex smoothing operation:

  1. BMesh with random element order.
  2. BMesh with vertex-cluster-sorted element order. Note that #3 and #4 uses the same order.
  3. A version of BMesh with integer indices.
  4. A so-called 'data-oriented' structure.

The data-oriented structure is inspired by recent developments in the games industry, and looks like this:


typedef struct MeshTest {
  //vertex arrays
  float (*v_co)[3];
  float (*v_no)[3];
  int *v_e;
  int *v_index;
  int *v_flag;

  //edge arrays
  int *e_v1;
  int *e_v2;
  int *e_v1_next;
  int *e_v2_next;
  int *e_l;
  int *e_flag;
  int *e_index;

  //loop arrays
  int *l_v;
  int *l_e;
  int *l_f;
  int *l_next;
  int *l_prev;
  int *l_radial_next;
  int *l_radial_prev;

  //face arrays
  int *f_l;
  int *f_index;
  int *f_flag;

  int totvert, totedge, totloop, totface;
  MemArena *arena;
} MeshTest;

Results

As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices.

Here are the raw results:

random cluster-ordered bm % improvement struct of arrays % improvement bm with indices % improvement RAM
1.22 1.04 14.42 0.73 67 0.94 29 0
1.49 1.46 2.35 1.1 36 1.17 27 0
1.29 1.13 14 0.75 71.54 0.89 45.08 0
1.58 1.4 12.3 1.09 44.7 1.11 42.42 16
1.53 1.36 12.77 1.08 41.6 1.07 42.91 0
1.56 1.39 12.47 1.09 42.65 1.1 42.15 16
1.22 1.06 15.05 0.75 63.85 0.82 49.67 32
2.32 1.99 16.54 1.87 23.8 1.5 54.89 16
0.86 0.76 12.14 0.28 202.44 0.57 51.13 32
0.87 0.76 14.0 0.54 61.47 0.72 20.16 32
Random bm average: 1.39 variance: 0.40 median: 1.51
Cluster BM average: 1.23 variange: 0.35 median: 1.38
Data Oriented average: 0.93 variange: 0.41 median: 1.08
BM With Indiced average: 0.99 variange: 0.25 median: 1.08

Long term plans

Given these results, I think we should move BMesh to use integer offsets instead of pointers. This would be a far less disruptive change then the data-oriented approach. Note that I don't think we should rebuild the functionality of BMesh on top of Mesh as custom attribute layers, as that would be too disruptive to the existing codebase.

Needed Modifications

The structural modifications would be fairly simple (the actual code work might be another matter); basically, instead of using mempools for storing elements we would use simple arrays with freed element lists.

A hypothetical element array structure might look like this:

typedef struct ElementArray {
  void *array;

  int element_size;
  int length;
  int total_size;

  BLI_bitmap usedmap;

  SomeKindOfList freelist;
} ElementArray;

Of course this would make accessing the structure more complicated:

BMLoop *l = bm->loops[e->l_i];

do {
} while ((l = bm->loops[l->radial_next_i]) != bm->loops[e->l_i]);

. . .but I suppose it's not too bad.

C++

It might be a good idea to use C++ for this. We could write ElementList and the topological
iterators in C++ and wrap them in C APIs. The core element structures will have to remain
in C though.

BMesh has served Blender well for many years, but it as a few problems: the memory overhead is high, and the performance isn't always great compared to pure Mesh arrays. # The Problem BMesh is a BREP (a variant of the classic half-edge data structure). The problem with BREPs is they do not always work well with CPU caches, especially on 64-bit platforms where all of the helper pointers can overwhelm the cache. When Geoffry Bantle designed BMesh he tried to improve cache *coherency* by writing the mempool library, but this did nothing to solve the problem of huge numbers of 8-byte helper pointers overwhelming the cache altogether (at the time Blender was still 32-bits). I recently had a bunch of users on Blenderartists run a series of benchmarks for a simple vertex smooth operation. To summarize the results (which will be discussed in more depth later) we can get a pretty decent performance boost (~42%) if we replace all helper pointers in BMesh with 4-byte integer indices. This makes sense. BMesh is memory bound (profiling consistently shows this), so replacing 8-byte pointers with 4-byte ints will reduce the load on the CPU cache and DRAM by quite a bit. # Short Intro To BMesh BMesh has the following structure: ![image.png](https://archive.blender.org/developer/F10396519/image.png) 1. Vertices, which store one pointer to an edge. 2. Edges, which store six pointers to vertices and one to loops (two for the edges plus the four used for each vertex's disk cycle, see blue lines). 3. Loops, which have 1 pointers each pointing at a vertex, edge, and a face, along with four more pointers to other loops. Like edges loops participate in two linked lists: the face they belong to, and also the radial cycle (see red line) of loops surrounding an edge. # The Benchmark To investigate where we should take BMesh in the future I wrote a series of benchmarks and [had users on BlenderArtists run them ](https://blenderartists.org/t/sculpt-dyntopo-test-build-updated/1323378/22). There are four benchmarks in total, each of which run a simple vertex smoothing operation: 1. BMesh with random element order. 2. BMesh with vertex-cluster-sorted element order. Note that #3 and #4 uses the same order. 3. A version of BMesh with integer indices. 4. A so-called 'data-oriented' structure. The data-oriented structure is inspired by recent developments in the games industry, and looks like this: ``` typedef struct MeshTest { //vertex arrays float (*v_co)[3]; float (*v_no)[3]; int *v_e; int *v_index; int *v_flag; //edge arrays int *e_v1; int *e_v2; int *e_v1_next; int *e_v2_next; int *e_l; int *e_flag; int *e_index; //loop arrays int *l_v; int *l_e; int *l_f; int *l_next; int *l_prev; int *l_radial_next; int *l_radial_prev; //face arrays int *f_l; int *f_index; int *f_flag; int totvert, totedge, totloop, totface; MemArena *arena; } MeshTest; ``` ## Results As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices. Here are the raw results: | random | cluster-ordered bm | % improvement| struct of arrays | % improvement| bm with indices | % improvement| RAM| |--------|--------|--------|--------|--------|--------|--------|--------| |1.22|1.04|14.42|0.73|67|0.94|29|0| |1.49|1.46|2.35|1.1|36|1.17|27|0| |1.29|1.13|14|0.75|71.54|0.89|45.08|0| |1.58|1.4|12.3|1.09|44.7|1.11|42.42|16| |1.53|1.36|12.77|1.08|41.6|1.07|42.91|0| |1.56|1.39|12.47|1.09|42.65|1.1|42.15|16| |1.22|1.06|15.05|0.75|63.85|0.82|49.67|32| |2.32|1.99|16.54|1.87|23.8|1.5|54.89|16| |0.86|0.76|12.14|0.28|202.44|0.57|51.13|32| |0.87|0.76|14.0|0.54|61.47|0.72|20.16|32| |**Random bm** |average: 1.39 |variance: 0.40 |median: 1.51| | -- | -- | -- | -- | |**Cluster BM** | average: 1.23 |variange: 0.35 |median: 1.38| |**Data Oriented** | average: 0.93| variange: 0.41 |median: 1.08| |**BM With Indiced**| average: 0.99| variange: 0.25| median: 1.08| # Long term plans Given these results, I think we should move BMesh to use integer offsets instead of pointers. This would be a far less disruptive change then the data-oriented approach. Note that I *don't* think we should rebuild the functionality of BMesh on top of Mesh as custom attribute layers, as that would be too disruptive to the existing codebase. ## Needed Modifications The structural modifications would be fairly simple (the actual code work might be another matter); basically, instead of using mempools for storing elements we would use simple arrays with freed element lists. A hypothetical element array structure might look like this: ``` typedef struct ElementArray { void *array; int element_size; int length; int total_size; BLI_bitmap usedmap; SomeKindOfList freelist; } ElementArray; ``` Of course this would make accessing the structure more complicated: ``` BMLoop *l = bm->loops[e->l_i]; do { } while ((l = bm->loops[l->radial_next_i]) != bm->loops[e->l_i]); ``` . . .but I suppose it's not too bad. # C++ It might be a good idea to use C++ for this. We could write ElementList and the topological iterators in C++ and wrap them in C APIs. The core element structures will have to remain in C though.
Author
Member

Added subscribers: @JosephEagar, @PabloDobarro

Added subscribers: @JosephEagar, @PabloDobarro
Joseph Eagar changed title from BREP Design to BMesh Refactor 2021-09-11 05:58:52 +02:00
Author
Member

Added subscribers: @ideasman42, @howardt, @mont29, @Sergey

Added subscribers: @ideasman42, @howardt, @mont29, @Sergey
Member

Added subscriber: @kursadk

Added subscriber: @kursadk
Contributor

Added subscriber: @scurest

Added subscriber: @scurest

Added subscriber: @TheRedWaxPolice

Added subscriber: @TheRedWaxPolice

Added subscriber: @GeorgiaPacific

Added subscriber: @GeorgiaPacific

Added subscriber: @bent

Added subscriber: @bent

Added subscriber: @cdog

Added subscriber: @cdog

This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it?
What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty?

Is the data-oriented type is another name for structure-of-arrays type of layout?

I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it? What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty? Is the data-oriented type is another name for structure-of-arrays type of layout? I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

Added subscriber: @Roggii-4

Added subscriber: @Roggii-4

Added subscriber: @SteffenD

Added subscriber: @SteffenD
Author
Member

[replied to the wrong post.]

[replied to the wrong post.]
Author
Member

In #91329#1219180, @Sergey wrote:
This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it?
What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty?

Is the data-oriented type is another name for structure-of-arrays type of layout?

I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

There's no need to maintain orderings. The idea is to have an array where you allocate and free elements;
basically a flat version of mempool. We could in fact use mempool with a lookup table for the chunks,
though that would add a layer of indirection.

I'm not proposing we do this anytime soon. For the actual refactoring I'm thinking of writing a clang-tidy module to automate as much of it as possible.

> In #91329#1219180, @Sergey wrote: > This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it? > What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty? > > Is the data-oriented type is another name for structure-of-arrays type of layout? > > I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test. There's no need to maintain orderings. The idea is to have an array where you allocate and free elements; basically a flat version of mempool. We could in fact *use* mempool with a lookup table for the chunks, though that would add a layer of indirection. I'm not proposing we do this anytime soon. For the actual refactoring I'm thinking of writing a clang-tidy module to automate as much of it as possible.
Member

Added subscriber: @Harley

Added subscriber: @Harley
Member

The part that confuses me is that you summarize the tests by saying that "As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices."

But your raw data seems to indicate that "struct of arrays" gave you an average increase of 65.5% where "bm with indices" gives you 74.62%. Unless I am misunderstanding something or transposed your numbers wrong. That table could really use a summary at the bottom

The part that confuses me is that you summarize the tests by saying that "As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices." But your raw data seems to indicate that "struct of arrays" gave you an average increase of 65.5% where "bm with indices" gives you 74.62%. Unless I am misunderstanding something or transposed your numbers wrong. That table could really use a summary at the bottom

Added subscriber: @mattli911

Added subscriber: @mattli911

I'm not proposing we do this anytime soon.

What is the proposal then? What's the purpose of this design task?

> I'm not proposing we do this anytime soon. What is the proposal then? What's the purpose of this design task?
Member

Added subscriber: @JulienKaspar

Added subscriber: @JulienKaspar
Author
Member

I've updated the table and summarized it, here's the python script I originally used to wrangle the data.

stats.py

I've updated the table and summarized it, here's the python script I originally used to wrangle the data. [stats.py](https://archive.blender.org/developer/F10510681/stats.py)
Author
Member

In #91329#1222149, @Sergey wrote:

I'm not proposing we do this anytime soon.

What is the proposal then? What's the purpose of this design task?

To make long-term plans and goals?

> In #91329#1222149, @Sergey wrote: >> I'm not proposing we do this anytime soon. > > What is the proposal then? What's the purpose of this design task? To make long-term plans and goals?
Member

Added subscriber: @HooglyBoogly

Added subscriber: @HooglyBoogly
Member

It seems to me that maintaining topology maps on top of the existing Mesh structure is the best way to achieve the data-oriented approach.

  • For saving and loading files, no conversion is necessary, the topology maps can just be discarded
  • Lots of existing code uses Mesh
  • The structs like MVert can be relatively simply split up to store normals separately, I started on that here: #91186
  • These maps could be cached and calculated lazily

So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using Mesh in more places, like many modifiers already do.

It seems to me that maintaining topology maps on top of the existing `Mesh` structure is the best way to achieve the data-oriented approach. - For saving and loading files, no conversion is necessary, the topology maps can just be discarded - Lots of existing code uses `Mesh` - The structs like `MVert` can be *relatively* simply split up to store normals separately, I started on that here: #91186 - These maps could be cached and calculated lazily So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using `Mesh` in more places, like many modifiers already do.
Author
Member

In #91329#1222494, @HooglyBoogly wrote:
It seems to me that maintaining topology maps on top of the existing Mesh structure is the best way to achieve the data-oriented approach.

  • For saving and loading files, no conversion is necessary, the topology maps can just be discarded
  • Lots of existing code uses Mesh
  • The structs like MVert can be relatively simply split up to store normals separately, I started on that here: #91186
  • These maps could be cached and calculated lazily

So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using Mesh in more places, like many modifiers already do.

The basic intuition you are expressing has been obvious since the very beginning. Geoffry Bantle actually tried to make BMesh as Mesh-like as he possibly could, and I suspect he would have used integer indices if 64-bit processors had been more widespread back then. To be honest if Mesh with topology maps was good enough then BMesh would not exist.

> In #91329#1222494, @HooglyBoogly wrote: > It seems to me that maintaining topology maps on top of the existing `Mesh` structure is the best way to achieve the data-oriented approach. > - For saving and loading files, no conversion is necessary, the topology maps can just be discarded > - Lots of existing code uses `Mesh` > - The structs like `MVert` can be *relatively* simply split up to store normals separately, I started on that here: #91186 > - These maps could be cached and calculated lazily > > So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using `Mesh` in more places, like many modifiers already do. The basic intuition you are expressing has been obvious since the very beginning. Geoffry Bantle actually tried to make BMesh as Mesh-like as he possibly could, and I suspect he would have used integer indices if 64-bit processors had been more widespread back then. To be honest if Mesh with topology maps was good enough then BMesh would not exist.

Added subscriber: @fanny

Added subscriber: @fanny

Added subscriber: @SamGreen

Added subscriber: @SamGreen

Added subscriber: @AlexeyAdamitsky

Added subscriber: @AlexeyAdamitsky

Added subscriber: @hzuika

Added subscriber: @hzuika
Philipp Oeser removed the
Interest
Modeling
label 2023-02-09 15:28:17 +01:00
Bastien Montagne added this to the Core project 2023-02-09 15:42:54 +01:00
Bastien Montagne removed this from the Core project 2023-02-09 18:20:37 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
18 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#91329
No description provided.