BMesh Refactor #91329

New Issue

Joseph Eagar · 2021-09-11T05:58:28+02:00

Joseph Eagar commented

2021-09-11 05:58:28 +02:00

BMesh has served Blender well for many years, but it as a few problems: the memory overhead is high, and the performance isn't always great compared to pure Mesh arrays.

The Problem

BMesh is a BREP (a variant of the classic half-edge data structure). The problem with BREPs is they do not always work well with CPU caches, especially on 64-bit platforms where all of the helper pointers can overwhelm the cache. When Geoffry Bantle designed BMesh he tried to improve cache coherency by writing the mempool library, but this did nothing to solve the problem of huge numbers of 8-byte helper pointers overwhelming the cache altogether (at the time Blender was still 32-bits).

I recently had a bunch of users on Blenderartists run a series of benchmarks for a simple vertex smooth operation. To summarize the results (which will be discussed in more depth later) we can get a pretty decent performance boost (~42%) if we replace all helper pointers in BMesh with 4-byte integer indices.

This makes sense. BMesh is memory bound (profiling consistently shows this), so replacing 8-byte pointers with 4-byte ints will reduce the load on the CPU cache and DRAM by quite a bit.

Short Intro To BMesh

BMesh has the following structure:

Vertices, which store one pointer to an edge.
Edges, which store six pointers to vertices and one to loops (two for the edges plus the four used for each vertex's disk cycle, see blue lines).
Loops, which have 1 pointers each pointing at a vertex, edge, and a face, along with four more pointers to other loops. Like edges loops participate in two linked lists: the face they belong to, and also the radial cycle (see red line) of loops surrounding an edge.

The Benchmark

To investigate where we should take BMesh in the future I wrote a series of benchmarks and
had users on BlenderArtists run them . There are four benchmarks in total, each of which run a simple vertex smoothing operation:

BMesh with random element order.
BMesh with vertex-cluster-sorted element order. Note that #3 and #4 uses the same order.
A version of BMesh with integer indices.
A so-called 'data-oriented' structure.

The data-oriented structure is inspired by recent developments in the games industry, and looks like this:


typedef struct MeshTest {
  //vertex arrays
  float (*v_co)[3];
  float (*v_no)[3];
  int *v_e;
  int *v_index;
  int *v_flag;

  //edge arrays
  int *e_v1;
  int *e_v2;
  int *e_v1_next;
  int *e_v2_next;
  int *e_l;
  int *e_flag;
  int *e_index;

  //loop arrays
  int *l_v;
  int *l_e;
  int *l_f;
  int *l_next;
  int *l_prev;
  int *l_radial_next;
  int *l_radial_prev;

  //face arrays
  int *f_l;
  int *f_index;
  int *f_flag;

  int totvert, totedge, totloop, totface;
  MemArena *arena;
} MeshTest;

Results

As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices.

Here are the raw results:

random	cluster-ordered bm	% improvement	struct of arrays	% improvement	bm with indices	% improvement	RAM
1.22	1.04	14.42	0.73	67	0.94	29	0
1.49	1.46	2.35	1.1	36	1.17	27	0
1.29	1.13	14	0.75	71.54	0.89	45.08	0
1.58	1.4	12.3	1.09	44.7	1.11	42.42	16
1.53	1.36	12.77	1.08	41.6	1.07	42.91	0
1.56	1.39	12.47	1.09	42.65	1.1	42.15	16
1.22	1.06	15.05	0.75	63.85	0.82	49.67	32
2.32	1.99	16.54	1.87	23.8	1.5	54.89	16
0.86	0.76	12.14	0.28	202.44	0.57	51.13	32
0.87	0.76	14.0	0.54	61.47	0.72	20.16	32

Random bm	average: 1.39	variance: 0.40	median: 1.51
Cluster BM	average: 1.23	variange: 0.35	median: 1.38
Data Oriented	average: 0.93	variange: 0.41	median: 1.08
BM With Indiced	average: 0.99	variange: 0.25	median: 1.08

Long term plans

Given these results, I think we should move BMesh to use integer offsets instead of pointers. This would be a far less disruptive change then the data-oriented approach. Note that I don't think we should rebuild the functionality of BMesh on top of Mesh as custom attribute layers, as that would be too disruptive to the existing codebase.

Needed Modifications

The structural modifications would be fairly simple (the actual code work might be another matter); basically, instead of using mempools for storing elements we would use simple arrays with freed element lists.

A hypothetical element array structure might look like this:

typedef struct ElementArray {
  void *array;

  int element_size;
  int length;
  int total_size;

  BLI_bitmap usedmap;

  SomeKindOfList freelist;
} ElementArray;

Of course this would make accessing the structure more complicated:

BMLoop *l = bm->loops[e->l_i];

do {
} while ((l = bm->loops[l->radial_next_i]) != bm->loops[e->l_i]);

. . .but I suppose it's not too bad.

C++

It might be a good idea to use C++ for this. We could write ElementList and the topological
iterators in C++ and wrap them in C APIs. The core element structures will have to remain
in C though.

BMesh has served Blender well for many years, but it as a few problems: the memory overhead is high, and the performance isn't always great compared to pure Mesh arrays. # The Problem BMesh is a BREP (a variant of the classic half-edge data structure). The problem with BREPs is they do not always work well with CPU caches, especially on 64-bit platforms where all of the helper pointers can overwhelm the cache. When Geoffry Bantle designed BMesh he tried to improve cache *coherency* by writing the mempool library, but this did nothing to solve the problem of huge numbers of 8-byte helper pointers overwhelming the cache altogether (at the time Blender was still 32-bits). I recently had a bunch of users on Blenderartists run a series of benchmarks for a simple vertex smooth operation. To summarize the results (which will be discussed in more depth later) we can get a pretty decent performance boost (~42%) if we replace all helper pointers in BMesh with 4-byte integer indices. This makes sense. BMesh is memory bound (profiling consistently shows this), so replacing 8-byte pointers with 4-byte ints will reduce the load on the CPU cache and DRAM by quite a bit. # Short Intro To BMesh BMesh has the following structure: ![image.png](https://archive.blender.org/developer/F10396519/image.png) 1. Vertices, which store one pointer to an edge. 2. Edges, which store six pointers to vertices and one to loops (two for the edges plus the four used for each vertex's disk cycle, see blue lines). 3. Loops, which have 1 pointers each pointing at a vertex, edge, and a face, along with four more pointers to other loops. Like edges loops participate in two linked lists: the face they belong to, and also the radial cycle (see red line) of loops surrounding an edge. # The Benchmark To investigate where we should take BMesh in the future I wrote a series of benchmarks and [had users on BlenderArtists run them ](https://blenderartists.org/t/sculpt-dyntopo-test-build-updated/1323378/22). There are four benchmarks in total, each of which run a simple vertex smoothing operation: 1. BMesh with random element order. 2. BMesh with vertex-cluster-sorted element order. Note that #3 and #4 uses the same order. 3. A version of BMesh with integer indices. 4. A so-called 'data-oriented' structure. The data-oriented structure is inspired by recent developments in the games industry, and looks like this: ``` typedef struct MeshTest { //vertex arrays float (*v_co)[3]; float (*v_no)[3]; int *v_e; int *v_index; int *v_flag; //edge arrays int *e_v1; int *e_v2; int *e_v1_next; int *e_v2_next; int *e_l; int *e_flag; int *e_index; //loop arrays int *l_v; int *l_e; int *l_f; int *l_next; int *l_prev; int *l_radial_next; int *l_radial_prev; //face arrays int *f_l; int *f_index; int *f_flag; int totvert, totedge, totloop, totface; MemArena *arena; } MeshTest; ``` ## Results As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices. Here are the raw results: | random | cluster-ordered bm | % improvement| struct of arrays | % improvement| bm with indices | % improvement| RAM| |--------|--------|--------|--------|--------|--------|--------|--------| |1.22|1.04|14.42|0.73|67|0.94|29|0| |1.49|1.46|2.35|1.1|36|1.17|27|0| |1.29|1.13|14|0.75|71.54|0.89|45.08|0| |1.58|1.4|12.3|1.09|44.7|1.11|42.42|16| |1.53|1.36|12.77|1.08|41.6|1.07|42.91|0| |1.56|1.39|12.47|1.09|42.65|1.1|42.15|16| |1.22|1.06|15.05|0.75|63.85|0.82|49.67|32| |2.32|1.99|16.54|1.87|23.8|1.5|54.89|16| |0.86|0.76|12.14|0.28|202.44|0.57|51.13|32| |0.87|0.76|14.0|0.54|61.47|0.72|20.16|32| |**Random bm** |average: 1.39 |variance: 0.40 |median: 1.51| | -- | -- | -- | -- | |**Cluster BM** | average: 1.23 |variange: 0.35 |median: 1.38| |**Data Oriented** | average: 0.93| variange: 0.41 |median: 1.08| |**BM With Indiced**| average: 0.99| variange: 0.25| median: 1.08| # Long term plans Given these results, I think we should move BMesh to use integer offsets instead of pointers. This would be a far less disruptive change then the data-oriented approach. Note that I *don't* think we should rebuild the functionality of BMesh on top of Mesh as custom attribute layers, as that would be too disruptive to the existing codebase. ## Needed Modifications The structural modifications would be fairly simple (the actual code work might be another matter); basically, instead of using mempools for storing elements we would use simple arrays with freed element lists. A hypothetical element array structure might look like this: ``` typedef struct ElementArray { void *array; int element_size; int length; int total_size; BLI_bitmap usedmap; SomeKindOfList freelist; } ElementArray; ``` Of course this would make accessing the structure more complicated: ``` BMLoop *l = bm->loops[e->l_i]; do { } while ((l = bm->loops[l->radial_next_i]) != bm->loops[e->l_i]); ``` . . .but I suppose it's not too bad. # C++ It might be a good idea to use C++ for this. We could write ElementList and the topological iterators in C++ and wrap them in C APIs. The core element structures will have to remain in C though.

Joseph Eagar commented

2021-09-11 05:58:28 +02:00

Added subscribers: @JosephEagar, @PabloDobarro

Joseph Eagar changed title from ~~BREP Design~~ to BMesh Refactor

2021-09-11 05:58:52 +02:00

Joseph Eagar commented

2021-09-11 06:22:27 +02:00

Added subscribers: @ideasman42, @howardt, @mont29, @Sergey

kursad k commented

2021-09-11 08:16:02 +02:00

Added subscriber: @kursadk

Scurest commented

2021-09-11 13:49:13 +02:00

Added subscriber: @scurest

TheRedWaxPolice commented

2021-09-11 23:03:37 +02:00

Added subscriber: @TheRedWaxPolice

Paul Larson commented

2021-09-12 02:58:18 +02:00

Added subscriber: @GeorgiaPacific

Bent Hillerkus commented

2021-09-12 03:25:24 +02:00

Added subscriber: @bent

Filip Sauer commented

2021-09-12 23:22:57 +02:00

Added subscriber: @cdog

Sergey Sharybin commented

2021-09-13 10:08:34 +02:00

This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it?
What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty?

Is the data-oriented type is another name for structure-of-arrays type of layout?

I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it? What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty? Is the data-oriented type is another name for structure-of-arrays type of layout? I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

Nana Beniako commented

2021-09-13 11:42:15 +02:00

Added subscriber: @Roggii-4

Steffen Dünner commented

2021-09-14 07:04:32 +02:00

Added subscriber: @SteffenD

Joseph Eagar commented

2021-09-17 23:46:00 +02:00

[replied to the wrong post.]

Joseph Eagar commented

2021-09-17 23:56:14 +02:00

In #91329#1219180, @Sergey wrote:
This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it?
What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty?

Is the data-oriented type is another name for structure-of-arrays type of layout?

I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test.

There's no need to maintain orderings. The idea is to have an array where you allocate and free elements;
basically a flat version of mempool. We could in fact use mempool with a lookup table for the chunks,
though that would add a layer of indirection.

I'm not proposing we do this anytime soon. For the actual refactoring I'm thinking of writing a clang-tidy module to automate as much of it as possible.

> In #91329#1219180, @Sergey wrote: > This is surely an interesting topic, but I'm missing some aspects of a design: what is the exact data structure you propose to use and how you see the current codebase to converge to it? > What are the downsides? Will operations like bevel/subdivide suffer from worse performance (due to i.e. need to maintain ordering in arrays)? Is there a memory usage footprint penalty? > > Is the data-oriented type is another name for structure-of-arrays type of layout? > > I'm also not sure how to read the table. Maybe you can explain better what columns and rows mean, and how did you do the performance test. There's no need to maintain orderings. The idea is to have an array where you allocate and free elements; basically a flat version of mempool. We could in fact *use* mempool with a lookup table for the chunks, though that would add a layer of indirection. I'm not proposing we do this anytime soon. For the actual refactoring I'm thinking of writing a clang-tidy module to automate as much of it as possible.

Harley Acheson commented

2021-09-18 00:41:04 +02:00

Added subscriber: @Harley

Harley Acheson commented

2021-09-18 00:41:04 +02:00

The part that confuses me is that you summarize the tests by saying that "As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices."

But your raw data seems to indicate that "struct of arrays" gave you an average increase of 65.5% where "bm with indices" gives you 74.62%. Unless I am misunderstanding something or transposed your numbers wrong. That table could really use a summary at the bottom

The part that confuses me is that you summarize the tests by saying that "As expected, the biggest performance gains came from the pure data-oriented approach. However a pretty hefty performance gain also came from simply replacing pointers with integer indices." But your raw data seems to indicate that "struct of arrays" gave you an average increase of 65.5% where "bm with indices" gives you 74.62%. Unless I am misunderstanding something or transposed your numbers wrong. That table could really use a summary at the bottom

Matt commented

2021-09-18 23:27:32 +02:00

Added subscriber: @mattli911

Sergey Sharybin commented

2021-09-20 09:18:28 +02:00

I'm not proposing we do this anytime soon.

What is the proposal then? What's the purpose of this design task?

> I'm not proposing we do this anytime soon. What is the proposal then? What's the purpose of this design task?

Julien Kaspar commented

2021-09-20 10:18:14 +02:00

Added subscriber: @JulienKaspar

Joseph Eagar commented

2021-09-20 11:27:11 +02:00

I've updated the table and summarized it, here's the python script I originally used to wrangle the data.

stats.py

I've updated the table and summarized it, here's the python script I originally used to wrangle the data. [stats.py](https://archive.blender.org/developer/F10510681/stats.py)

Joseph Eagar commented

2021-09-20 11:28:34 +02:00

In #91329#1222149, @Sergey wrote:

I'm not proposing we do this anytime soon.

What is the proposal then? What's the purpose of this design task?

To make long-term plans and goals?

> In #91329#1222149, @Sergey wrote: >> I'm not proposing we do this anytime soon. > > What is the proposal then? What's the purpose of this design task? To make long-term plans and goals?

Hans Goudey commented

2021-09-20 18:45:25 +02:00

Added subscriber: @HooglyBoogly

Hans Goudey commented

2021-09-20 18:45:25 +02:00

It seems to me that maintaining topology maps on top of the existing Mesh structure is the best way to achieve the data-oriented approach.

For saving and loading files, no conversion is necessary, the topology maps can just be discarded
Lots of existing code uses Mesh
The structs like MVert can be relatively simply split up to store normals separately, I started on that here: #91186
These maps could be cached and calculated lazily

So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using Mesh in more places, like many modifiers already do.

It seems to me that maintaining topology maps on top of the existing `Mesh` structure is the best way to achieve the data-oriented approach. - For saving and loading files, no conversion is necessary, the topology maps can just be discarded - Lots of existing code uses `Mesh` - The structs like `MVert` can be *relatively* simply split up to store normals separately, I started on that here: #91186 - These maps could be cached and calculated lazily So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using `Mesh` in more places, like many modifiers already do.

Joseph Eagar commented

2021-09-27 22:58:07 +02:00

In #91329#1222494, @HooglyBoogly wrote:
It seems to me that maintaining topology maps on top of the existing Mesh structure is the best way to achieve the data-oriented approach.

For saving and loading files, no conversion is necessary, the topology maps can just be discarded

Lots of existing code uses Mesh

The structs like MVert can be relatively simply split up to store normals separately, I started on that here: #91186

These maps could be cached and calculated lazily

So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using Mesh in more places, like many modifiers already do.

The basic intuition you are expressing has been obvious since the very beginning. Geoffry Bantle actually tried to make BMesh as Mesh-like as he possibly could, and I suspect he would have used integer indices if 64-bit processors had been more widespread back then. To be honest if Mesh with topology maps was good enough then BMesh would not exist.

> In #91329#1222494, @HooglyBoogly wrote: > It seems to me that maintaining topology maps on top of the existing `Mesh` structure is the best way to achieve the data-oriented approach. > - For saving and loading files, no conversion is necessary, the topology maps can just be discarded > - Lots of existing code uses `Mesh` > - The structs like `MVert` can be *relatively* simply split up to store normals separately, I started on that here: #91186 > - These maps could be cached and calculated lazily > > So while the index based refactor of BMesh also seems to make sense, something else to pursue might be using `Mesh` in more places, like many modifiers already do. The basic intuition you are expressing has been obvious since the very beginning. Geoffry Bantle actually tried to make BMesh as Mesh-like as he possibly could, and I suspect he would have used integer indices if 64-bit processors had been more widespread back then. To be honest if Mesh with topology maps was good enough then BMesh would not exist.

Erik Sanabria commented

2021-11-11 20:10:04 +01:00

Added subscriber: @fanny

Sam Green commented

2021-11-20 07:00:06 +01:00

Added subscriber: @SamGreen

Alexey Adamitsky commented

2021-12-01 07:21:38 +01:00

Added subscriber: @AlexeyAdamitsky

Yuki Hashimoto commented

2022-03-21 14:07:14 +01:00

Added subscriber: @hzuika

Philipp Oeser removed the

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

BMesh Refactor #91329

The Problem

Short Intro To BMesh

The Benchmark

Results

Long term plans

Needed Modifications

C++