Page MenuHome

Fix broken nodes copying, calling to copy-API and user counting
Needs RevisionPublic

Authored by Maxim Vasiliev (qmax) on Dec 25 2019, 7:36 PM.

Details

Summary

Call to user-defined copy method of NodeCustomGroup is broken. In addition, implementation of some related operators has pure logical bugs (confising true vs false), that happened to partially compensate each other.

The cause of broken copy is that the copyfunc_api is often called from within BKE_node_copy_ex <- BKE_node_tree_copy_data <- BKE_id_copy_ex when the data is in inconsistent state.
Initializing property node_tree within python copy updates counters of old/new trees. This makes source trees which are still in use to have 0 user counter, which may lead to NULL pointers or attempts to push counter below zero, destroying materials and crashing blender.

This patch adds new flag 'COPY_NO_API` and uses it to move the call to api to the very end of BKE_id_copy_ex after state of counters is settled.

Use cases adressed/affected/fixed/tested

  • copying a tree with ID.copy -- FIXED // D6420/test_api_trees_copy
  • duplicating node -- tested // D6420/test_op_node_duplicate
  • copypasting node -- tested // D6420/test_op_node_copypaste, D6420/test_op_node_copypaste_vanished
  • making group -- tested // D6420/test_op_group_make
  • ungrouping node group -- FIXED // D6420/test_op_group_ungroup
  • separating node from group -- tested // D6420/test_op_group_separate_copy, D6420/test_op_group_separate_move
  • duplicating group -- tested // D6420/test_op_group_duplicate
  • singleusering a group -- FIXED // T72659
  • cloning material -- FIXED // D6420/test_op_mat_clone, T72317
  • copypasting material -- PARTIALLY FIXED // D6420/test_op_mat_clone, T72579
  • singleusering a material -- FIXED

In the case of copypasting material, the call to copy is fixed, but user counters still bogus because of some other bug.

Other affected code

Basically, everything that directly or indirectly uses BKE_id_copy_ex or BKE_node_copy_ex without flag CREATE_NO_MAIN.

Diff Detail

Event Timeline

I failed to understand logic of BKE_library_foreach_ID_link and id_copy_libmanagement_cb which are used within BKE_id_copy_ex to update counters.
But the point is that it prevents updating counters for nodetrees that belong to materials, so that the updating is postponed until all material data has been copied completely.

Perhaps, this also applies to other owners of nodetrees such as scene, world, and everything. Then they probably should also be included in the call api block.

The patch should be corrected to handle other IDs that contain built-in nodetree, and might contain some NodeCustomGroups

According to api reference:

  • Scene
  • Light
  • World
  • Texture -- apparently, deprecated before introduction of the python api
  • FreestyleLineStyle -- doesn't seem to support any nodes yet, but has nodetree property

Perhaps, sufficient testing would be to try using ID::copy on corresponding instances.

Bastien Montagne (mont29) requested changes to this revision.Tue, Jan 14, 1:24 PM

[EDIT] Sorry, typed that review days ago, looks like I forgot to submit it… :/

I think that patch is going in the right direction, but does not go far enough, and hence ends up doing some bad things on the road.

ID copying code is absolutely not designed to be recursive, we do have a few exceptions hard-coded there (like the actions from animdata, and 'not reall datablocks' like root node trees or shape keys, but those have a fairly well defined context and behavior. And getting them to work was already a little nightmare.

So a generic py API callback that can do basically anything is just a no-go here, it’s a full violation of the whole design. Trying to make it work imho is doomed to fail and/or lead to spaghetti code that nobody can follow anymore.

Good thing: move py API callback out of (after the) ID copying code itself.
Bad thing: this call should be completely removed from BKE_node_copy_ex() imho. There is no way we can keep it there and not be bitten by it one way or another.

Bad thing: changing things like copy/paste to use ID refcounting. This should really not be needed.

I also think that the new LIB_ID_COPY_NO_API flag is not needed. Basically, we never want to call that callback when we are not refcounting, so just checking for LIB_ID_CREATE_NO_USER_REFCOUNT (which should always imply LIB_ID_CREATE_NO_MAIN) should be enough here?

source/blender/blenkernel/intern/library.c
628

style (spaces around assignment op) ;)

792

We require comments to be proper sentences now, and to be meaningful. E.g;:

/* Call copy_api for nodes in nodetrees after all ID management of current copy has been done, 
 * since generic recursive ID copying is not supported currently.
 * This API callback is only supported in cases where user refcounting is enabled. */
798–800

You are missing all the other nodetree users (scene, light, etc.).

source/blender/blenkernel/intern/node.c
1598

Wow… that is an horribly critical reversed logic mistake indeed! That one I’ll fix immediately in master.

Weird it could survive one whole year without getting noticed. I’d expect it to generate a lot of bugs with nodes… :/

This revision now requires changes to proceed.Tue, Jan 14, 1:24 PM

So a generic py API callback that can do basically anything is just a no-go here, it’s a full violation of the whole design. Trying to make it work imho is doomed to fail and/or lead to spaghetti code that nobody can follow anymore.
Good thing: move py API callback out of (after the) ID copying code itself.
Bad thing: this call should be completely removed from BKE_node_copy_ex() imho. There is no way we can keep it there and not be bitten by it one way or another.

This would be no more "a fix", but rather refactoring.
It will affect numerous use cases besides listed in this patch.
Such refactoring requires spreading the api calls all over the code (about 10-20 places at a rough guess: all the operators, copypasting code, singleusering code, and perhaps localmaking).
Seems like less maintainable. Although the places probably provide more sensible context.

And it needs some clarification about semantic of the copy.
My best assumption is that it is intended to be a hook called after all native copying and initializations, but before any update-hooks triggered (so that hooks operate on a new nodetree, if any).
This doesn't match current (attempt of) implementation, because currently the api is called before reinitialization of properties/inputs and relinking.

Looks like a big task on its own, or under T61655.

This comment was removed by Maxim Vasiliev (qmax).

I also think that the new LIB_ID_COPY_NO_API flag is not needed. Basically, we never want to call that callback when we are not refcounting, so just checking for LIB_ID_CREATE_NO_USER_REFCOUNT (which should always imply LIB_ID_CREATE_NO_MAIN) should be enough here?

All the BKE_*_copy_data do recursive calls to BKE_id_copy_ex without changing LIB_ID_CREATE_NO_USER_REFCOUNT -- they only clear LIB_ID_CREATE_NO_ALLOCATE for the calls.
So, the flag cannot be used to distinguish direct calls from recursive calls.