Page MenuHome

Fix T59848: precisely represent the dependencies of Armature modifier.
ClosedPublic

Authored by Alexander Gavrilov (angavrilov) on Apr 20 2019, 2:11 PM.

Details

Summary

When the modifier uses vertex groups, the set of the bones it actually
needs is precisely defined by the set of the group names. If envelopes
are enabled, this refinement is not available, because any bone can
potentially be used.

This can be used in the dependency graph construction to allow objects
deformed by a part of the armature to be used in constraints on other
bones, e.g. for placing cartoon-style face elements on top of the body
mesh via Shrinkwrap constraints.

Since the list of vertex group names is now used as an input by
the dependency graph, adding/removing/renaming groups should now
be triggering a graph rebuild.

Diff Detail

Repository
rB Blender

Event Timeline

Main concern i'm having here is the evaluation time. I am currently talking to Alexander in IRC about doing benchmark with Spring scenes.

source/blender/modifiers/intern/MOD_armature.c
98

ctx->object can not be NULL.

100–101

Should this have a corresponding else with a relation to the entire pose?

I'm fine with this if performance is good.

source/blender/modifiers/intern/MOD_armature.c
100–101

Well, I might be mistaken, but afaict if neither envelopes nor groups are enabled or available, the modifier will simply do nothing, so it has no dependencies. The envelope case requires a link to the whole pose because any deform bone can be used (even if groups are enabled in some cases), and BONE_NO_DEFORM can be animated.

source/blender/modifiers/intern/MOD_armature.c
100–101

Worth adding as a comment maybe?

So on 02_055_A from Spring this seems to reduce fps from around 12 to 11.3, so maybe worth making this an option in the modifier? This is really needed only on certain specially used meshes in certain kind of rigs.

Ideally we avoid such settings in modifiers, it's pretty obscure.

Do we understand why it has such a big performance impact? Is it possible to automatically detect cases where these relations are not needed?

In principle you can do automatic graph simplification after building, inserting intermediate nodes to reduce the number of edges. So that both the armature and the pose can depend on this intermediate node, and the edges from every bone only exist once. I'm not sure if there is a simple and efficient algorithm for that. It might help performance in other cases, though I don't know practical examples immediately.

source/blender/modifiers/intern/MOD_armature.c
105

You could check if the property is animated or has a driver, and in that case add always add the relation. We do this kind of thing for animated object visibility and will also need it to fix the animated bbone parameter crash.

Do we understand why it has such a big performance impact?

It replaces one dependency to 'all bones done' node with one separate link for every 'necessary' bone. I tried profiling but didn't see any obvious differences in the top list of functions. Maybe it's cache miss cost due to pointer access or something.

Is it possible to automatically detect cases where these relations are not needed?

They are not needed if no bones in the armature have e.g. constraints targeting this mesh, or something depending on it. I.e. it is dependency cycle prevention, and can't be detected without knowing the rest of the graph.

source/blender/modifiers/intern/MOD_armature.c
105

Can you do that easily?

It replaces one dependency to 'all bones done' node with one separate link for every 'necessary' bone. I tried profiling but didn't see any obvious differences in the top list of functions. Maybe it's cache miss cost due to pointer access or something.

I would expect some a threading thing, locks or atomic variables. Could also be that the thread scheduling happens to change in such a way that performance is worse on certain machines, even if it's not true in general.

They are not needed if no bones in the armature have e.g. constraints targeting this mesh, or something depending on it. I.e. it is dependency cycle prevention, and can't be detected without knowing the rest of the graph.

Yeah, it would need to be some algorithm that looks at the overall graph. It's fairly easy to do a transitive reduction to remove unnecessary edges, but this would be worse for parallelization. By adding intermediate nodes that can be avoided, but it may not be practical or too much work to fix this specific bug.

source/blender/modifiers/intern/MOD_armature.c
105

There is no really simple and efficient way to check it currently. See visibility_animated_check_cb for how it's done now.

However I imagine that in the spring rig, the bones that happen to have a matching vertex groups but are not used for deform is quite small and would not make much of a performance impact.

Transitive reduction would not help in a general case. It is possible to have bones in a deform group which don't belong to a single parent chain.

There are few usual suspects for performance impact for changes like this.

deg_graph_flush_updates().
One possible case here is due to more "forking" more pushes happens to the queue instead of doing an immediate switch to the to operation.
Could also be just overhead of a loops, like in flush_schedule_children().

If the traversal queue becomes bigger, this also means more re-allocations, meaning slower flushing. For this specific change i wouldn't expect this from happening, but worth double-checking on the maximum queue size.

deg_evaluate_on_refresh().
There is some overhead in the routines there as well. The biggest one which also affects on threading is an atomic operation, to update flags and decrease number of pending parents in schedule_node(). Atomics are not coming for free, and can ruin your day :(

The more "fan-out" relations might also change scheduling order.

Is worth testing whether there is a performance impact when running with --debug-depsgraph-no-threads, just to see if it's scheduling/threading related, or due to more overhead in general.

Is worth testing whether there is a performance impact when running with --debug-depsgraph-no-threads, just to see if it's scheduling/threading related, or due to more overhead in general.

With that the difference seems much smaller: the change is from around 6.2 to 6.1.

So, I tried an experiment by changing the patch to add both the new individual bone relations, and the whole pose link - this basically adds all of the overhead without actually changing the scheduling.

This seems to return the performance to the original values (any possible difference comparable to measurement noise), suggesting the problem is indeed the scheduling, or related issues like cache contention due to starting mesh processing before the armature evaluation is completely finished and such like.

I did more profiling using the free version of Intel VTune, and it seems to me this may be simply an instance of just a few long running tasks having to be executed at the end of processing when there are no more short tasks left. If circumstances are unlucky, there may not be enough active tasks to fill all 4 cores for a substantial length of time. Without this patch, things happen to line up so that the threads are nearly full almost all of the time, while with this change some of the 'big' tasks are shifted earlier and the final stretch of evaluation is left under-occupied.

I tried introducing BLI_task_parallel_range into armature and lattice deform modifiers, and it seems to improve things to a decrease of 12.1 to 11.7 or so (instead of 11.3), which seems to support this analysis. However, the remaining big tasks seem to involve smooth and corrective smooth modifiers, which will require some serious changes to be parallelized.

I think we can accept this performance impact for now, since it's a random scheduling thing and further optimizations are comming in D4753.

This revision is now accepted and ready to land.Apr 29 2019, 7:34 PM
This revision was automatically updated to reflect the committed changes.