Page MenuHome

Metaball optimization
Closed, ResolvedPublic

Description

Hi.

I've been experimenting with metaballs code. Managed to improve polygonization times with many metaballs from slow as hell to just barely slow, without sacrificing too much accuracy. Example of my results: 40k metaballs at resolution 0.05 (this is relative so image attached) in ~12s on my Phenom II 955. Measured with a stopwatch so don't quote me on this.


Changes to make this possible include:

  • Switched the octree for a binary BVH built using top-down method, dividing along spatial median (middle of longest axis).
  • Normals of the surface calculated using angle-weighted accumulation.
  • Finding first points of surface no longer "wastes" density function evaluation: every result is cached. Also, finds max. 6 starting points.
  • Converge procedure: better surface approximation through multiple linear approximations.
  • Using MemArena instead of new_pgn_element.
  • Proper calculation of metaelem bounding box.
  • Removal of mball_count procedure: mballs are now counted "on the go".

Wow, that is a lot of changes... Btw sorry for this technical talk. I started working on this for myself, just to see how fast I can polygonize those balls - blender was my sandbox, and it turned into this. Actually tried a lot of different optimization structures. I am looking for feedback(does the surface look good? what is your performance?). Is there a chance to get my code (or a part of it/after some adjustments) into blender? Note that metaballs are still somewhat broken. I'd be happy to work on them further but it involves deeper changes than this.

Here is the patch:

PS I moved global functions in mball.c to the bottom of the file since I didn't change them (also this is the only file changed), so that's why the diff looks terrible.
PPS I'm totally new to making diffs, contributing to an open source project etc. (to any project to be exact). If I f***ed something up, sorry. I don't even know if I'm posting in the right section.

Details

Type
Patch

Event Timeline

Krzysztof Rećko (chrisr) raised the priority of this task from to Needs Triage by Developer.
Krzysztof Rećko (chrisr) updated the task description. (Show Details)
Krzysztof Rećko (chrisr) set Type to Patch.

Hi, tested and looks very promising!

including technical details is good.

First pass review...

  • It is indeed a lot faster.
  • the metaballs look slightly different (vertex/edge) placement.
  • the normals look like they are based on the tessellated faces (instead of from the meta-surface data), it closely matches the result of the normals - converting to a mesh. (which could be argued is correct even), however geometry looks less smooth.

Notes on the patch

  • Please don't re-arrange functions in patches which modify existing code, it makes it much harder for us to tell what you modified or just moved.
  • Theres a few declarations after statements. (corrected, see below)
  • Generally sensible changes AFAICS - move to MemArena is good.
  • Otherwise patch seems good, follows out code-style etc.

Updated the patch, so its easier for us to see what changed, no functional changes - please use this patch for any further development.


probably best if we move tessellation code into mball_tessellate.c as a separate step.

Cool! Sorry for reordering stuff. As for the normals - I've actually made a quick comparison, have a look:


This is some "benchmark" fluid sim I made for timing the algorithm - 5000 particles towards the end of simulation. "Avg. time" is the average time in milliseconds of 250 frames, only bvh build + tessellation, measured using PIL_check_seconds_timer() on my machine. As can be seen, there definetly is a speed increase (of around 30%) when using angle-weighted accumulation. However, as you said, the surface looks less smooth - this is evident especially on the ball in the middle-lower picture, on which there are some strange features. I think the cause of this is that sometimes during tesselation very thin/small polygons are made, which mess things up. Ways I could think of to make it right:

  • stick with one of the ways to calculate normals or think of even a better one (area-weighted? non-weighted?)
  • make a "fast normal" switch in the mataball panel
  • run the resulting surface through something like "edge collapse" modifier, which would get rid of thin polygons and maybe make the surface look a little bit better

Maybe I'll try checking out the third option (in a few days), to see what the speed impact of this would be. Definitely looking forward to hearing from you!

I think that there are 2 settings for metaballs that are often underestimate : Threshold and Stiffness.
With methods to lower stiffness and make threshold more adaptative to neighbouring metaballs, the result should look less spherical.

@Krzysztof Rećko (chrisr), for the initial patch, it may be best to use (slower) surface normals.

Or include both methods but comment one out, then add some way to have faster normals as a second patch.

make it an ifdef or a bool argument.


Note that we don't really have an active metaballs maintainer at the moment, so when reviewing changes it helps if we can do in smaller steps, to at least check on a basic level that things remain functional. Also if anything breaks it makes it easier to git bisect - to track down what change broke.

But the current patch looks quite good and think we can likely apply after 2.74 release.

Campbell Barton (campbellbarton) lowered the priority of this task from Needs Triage by Developer to Normal.

OK. Here is the patch with normals from meta:

To use faster normals, define MB_ACCUM_NORMAL.
One other change I've made - hard capped the threshold to be bigger than or equal to 0.001f (only during polygonization, so you can still set 0 threshold in metaball panel). The algorithm was getting too slow at very small thresholds because of the way converge() works now. Currently I can't find a way around it - when I change how converge works, metaballs look bad at lower thresholds, which is caused by the nature of metaball equation. To be honest, the old algorithm doesn't handle such small thresholds very well either. There should be a good way to do converge().

Hey, made some improvements to converge(). Basically went back to blend of bisection method and linear interpolation from original algorithm. Metaballs should look good now at every threshold and stiffness. Removed the threshold cap. Still, this would need some tweaks to be optimal but it's very hard to test every combination of threshold, stiffness and size.

@Krzysztof Rećko (chrisr), Since we're currently only accepting fixes into master (until after 2.74 is release), I've made a branch temp-mball-refactor, and moved this code into mball_tessellation.c.

rB7a60a311b33507e048fb47b62e4ba24b51ad883d
rB2dd48492d55c268213374472973de900685ca57a

Would you be able to update the patch to apply on this branch? (since its a temp branch we can push the patch there at any time).
if not, I'll take a look later on.

Yes. This should work fine:

.
No functional changes made.

Eh, of course screwed up the last version (bug in makecubetable, I didn't copy everything properly). Here is fixed diff:

.

Hi,
good to see somebody working on this part of Blender again :-).

I have only few notes about your patch and implementation of metaballs:

  • I would be careful about finding only 6 cubes in find_first_points(). There were 26 directions of searching for first cube at surface, because metaballs just disappear in some positions. If you will search for old mball bugs, you will find some test files.
  • Interactive editing of metaballs could be improved with caching polygonized surface like subdivision surfaces do.
  • There is lot of space for improving quality of polygonized surface. Look e.g. at following paper and software implementing the theory:

Hey Jiri!

  • The idea behind find_first_points() was to evaluate the function only at "lattice points" to cache them in the hash table. The function evaluation (bvh traversal and actual densfunc()) take 95% of the time in polygonization, so any gain made in this area contributes a lot to speed. But, to be honest, I just coded it this way out of laziness temporarily :-) I could make a version with 26 directions and check how it behaves. If all values will be cached, the speed shouldn't be much worse, and surface accuracy is number 1 priority after all.
  • I don't exactly know what you mean by caching the surface? Updating only the part of the surface that was influenced by a moving metaball or something else? This could be a serious improvement, but not in the case when all metaballs are moving (eg. a fluid simulation). Could you explain it more?
  • Sorry, but I don't have the energy to look at these papers right now (very tiring day at uni, a lot of science done :-)). They look very interesting at first glance. Will check them out in detail later.

To add to this, lately I was trying to make the polygonization multi-threaded using OpenMP. The first trials are promising... But not quite without problems. I have some time tomorrow so I will try to set up a GitHub repo with my experiments :-)

Hi Krzysztof,

  • If I remember it correctly, then the original Bloomenthal's code used 100 random directions in find_first_point(). So 26 directions is IMHO not so much conservative :-).
  • Yeah, I talked about updating only the part of the surface that was influenced by a moving metaball{s). Of course, it is not useful, when all metaballs are moving (e.g. instances of particle system).
  • Multi-threading could improve polygonization and your BVH is more eligible for this purpose, then old octree.

OK, quick update of what I was doing the last couple of days:

  • Set up a repo at https://github.com/yeti19/blender.git with a mball-multithread branch:
    • Multi-threading done by dividing the whole space into "chunks". Each chunk gets its own center and corner hash tables, but edge table is common for all chunks - this introduces some locking overhead, but from the profiling I've done, it's like 3-7% at most, and saves us "stitching" the mesh in the end.
    • Memory usage optimization: introduce a new structure, MLSmall, which contains only data needed for computing density and BVH build. The MetaElem structure, with all "surroundings" is ~300 bytes (big BoundBox struct, two 4x4 matrices, deprecated values), while MLSmall is ~100 bytes. Also, allocate all metas from "their own" memarena, which keeps them close together in memory. All this should improve cache performance. However, I didn't do exact measurements.The added code complexity may not be worth it. This is only an experimental branch so I include almost all of my ideas here.
    • BKE_mball_basis_find change: not using BKE_iter_base_next to find basis mball, but simple scene/base iteration. This function gets called often (~once per mball) and every time makes all duplis in the scene. Duplis cannot be base mball, so I'm 99% sure this doesn't break anything.
    • Searching for first points in 26 directions, but using a binary-search type algorithm - this could be significantly faster in case of big mballs/small resolutions, but actually I can imagine a configuration of mballs in which this doesn't work properly, so I'm 50/50 on this. Also, I've made a mball-26directions branch on my repo (single threaded, linear search - it should be possible to apply on temp-mball-refactor branch in main repo) so this could be applied without any other experimental features :-)
    • Pre-allocating memory for vertices and indices before polygonization (@Campbell Barton (campbellbarton) suggested), based on number from last frame using a static variable - it's only temporary this way, ultimately will extract this information from the displist.
  • Read the paper @Jiri Hnidek (jiri) suggested, already coded some tests, and it looks promising! Definitely worth implementing. Mesh quality is better than using blender's laplacian smoothing/decimation and follows isosurface nicely. I'm not 100% sure however if it will work well with multi-threading (there might be problems on boundaries of chunks), but to be fair, I didn't think about it much, so I might be wrong (and everything will work nicely).

Overall results of all optimizations in this experimental branch: ~x2 speedup on my 4 core processor, so there may still be some work to do (but profiler says it's nearly optimal). Honestly it is far from finished, stuff I will try to do next:

  • Exchanging active cubes between chunks when going from one chunk to the next. Right now it is not implemented, so surface can be "cut off" in mid-air. It's a major bug but I already know how to fix it.
  • Better dividing space in appropriate number of chunks (for 2 core processor, only ~4 chunks will suffice, but for 64 core, we might need more). Also make the threading optional, so for 1 core processor it just polygonizes without dividing.
  • Just tweaking multithreading here and there. Test other options for edge table.
  • Better implement ideas from the paper suggested by Jiri (right now it works so buggy, I don't even push this branch to the public).
  • Will tackle caching surface as well, it doesn't seem very hard to do.

You can check out my branch and give me your thoughts, but keep in mind that it is as experimental as it can be. Also, the commit messages are a mess (for multi-threading at first I used OpenMP, which was not handling it well, and then switched to blender's threads) - which is why I write it all here :-) - and the code style may be a little bit off from original, but will improve in next iterations. I also have the changes in separate branches on my computer, but not tested on their own. Will try to separate them for easier testing.

As suggested by chisr , I tried to test mball-multithreaded branch.

It simply does not build on my pc.

Here is the log.
http://www.pasteall.org/57622

Committed to master!, closing.