Page MenuHome

Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects.
Closed, ResolvedPublicBUG

Description

System Information
Operating system: Linux-5.3.0-7625-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: GeForce RTX 2080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 440.44

Blender Version
Broken: version: 2.83 (sub 1), branch: HEAD, commit date: 2020-01-29 14:09, hash: rB9cb7ecefceee
Worked: Maybe a month or two ago?

Short description of error
When a proxy armature's custom property drives an object's hide_viewport or hide_render property, the console spams this warning:

add_relation(Driver -> Driven Property) - Failed, but op_from (OperationKey(type: PARAMETERS, component name: '', operation code: DRIVER, 'hide_render')) was ok
add_relation(Driver -> Driven Property) - Could not find op_to (RnaPathKey(id: OBCube, prop: 'hide_render'))

When the armature is not a proxy, the issue is still present, but the above warning is not.

In the case of a simple file, this doesn't seem to cause an issue, but as the file gets more complex, the driver ends up becoming unpredictable/unreliable, and can even result in it locking a mesh as visible or invisible until the driver is simply removed.

Exact steps for others to reproduce the error

  • Download attached file
  • Link "Collection" collection to a fresh file
  • Make proxy->"Armature"
  • Pose mode->change the bone's custom properties. "prop" drives object visibility, "prop1" drives a shape key.
  • The console spams the warnings only regarding the former.

This functionality will become important for our current project fairly soon, and I don't know of any workarounds.

Event Timeline

Demeter Dzadik (Mets) renamed this task from Console spamms add_relation() warnings and causes drivers to misbehave to Object visibility drivers misbehave when driven by proxy armature.Tue, Feb 4, 7:44 PM
Demeter Dzadik (Mets) renamed this task from Object visibility drivers misbehave when driven by proxy armature to Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies..Fri, Feb 7, 5:18 PM
Demeter Dzadik (Mets) updated the task description. (Show Details)
Germano Cavalcante (mano-wii) changed the task status from Needs Triage to Confirmed.Mon, Feb 10, 1:24 PM
Germano Cavalcante (mano-wii) changed the subtype of this task from "Report" to "Bug".

I can confirm the problem.

This does not appear to be a regression.
I tested previous versions and the same warnings appeared in blender 2.80rc1

It's been pointed out to me this may be a duplicate of T56635, but that one was closed as resolved. Maybe this should be merged into that one, and that one be re-opened? I'm not 100% sure it's the same issue, but definitely seems very similar.

Worked: Maybe a month or two ago?

So is that 2.81? 2.82? When did it work?

Sybren A. Stüvel (sybren) triaged this task as High priority.

When the armature is not a proxy, the issue is still present, but the above warning is not.

What is "the issue" exactly, if it's not the warning being shown? You describe some unreliability, but the steps to reproduce don't let me reproduce that unreliability.

Here is a fairly simplified file and video:



I didn't want to simplify it too much further since it feels like there is a correlation between scene complexity and how easy it is to reproduce the bug. (the more complex the scene, the more often the bug occurs) - even at this level of complexity, it seems that changing the values "fast"(multiple times per second) is necessary to trigger the bug.

And just in case, here's the full un-simplified character file, where reproducing the bug should be easiest:


There is an enum in the rig's custom UI that is responsible for applying presets to the outfit. It does this by simply changing the driving custom properties with an update callback function, so triggering the bug is the easiest with that enum:

As for a driver being completely stuck in one state, that only happened in a production file with a full set and several character rigs linked and proxied, so really an extremely complex scene. This was some time ago, and we didn't save it for a bug report - will try to in the future. But I'm hoping the above will be helpful enough. :x

Sybren A. Stüvel (sybren) renamed this task from Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies. to Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects..Tue, Feb 18, 3:06 PM

After looking into this with @Sybren A. Stüvel (sybren), we think the random failures are most likely due to a threading issue, where two threads are evaluating the viewport and render visibility driver for the same object in parallel. These two properties are stored as bitflags on the same integer, so writing to them at the same time is not thread-safe.

The solution we are thinking of is to enure such drivers are never evaluated in parallel. Instead we can group driver evaluation, so that all drivers for a single object or bone are evaluated serially. This could be done by checking the RNA paths, so that properties are grouped together for evaluation when they are in the same RNA struct.

Performance is likely not going to be significantly affected either way, evaluation like this should be granular enough. Creating fewer depsgraph nodes and relations may reduce overhead a little even.

@Sergey Sharybin (sergey), I think we had another report about (or at least we discussed it), but I can't find it, maybe you remember.

After looking into this with @Sybren A. Stüvel (sybren), we think the random failures are most likely due to a threading issue

Further testing confirmed this.

@Bastien Montagne (mont29) had the insight that this issue will also happen when keyframe-animating such properties. He's looking into making the RNA access atomic so that all of those issues can be resolved in one go.

I'm not sure what it means to make the RNA access atomic, how would that be implemented, manually for each property? If so, there are many of them and taking that into account for every new property sounds fragile. I'm also not sure to what extent writing a char or short could conflict with an adjacent variable.

If we can solve this in a generic way by grouping animation and driver evaluation, to me that still sounds better.

Also, using atomic instructions are slow and should be avoided when possible for performance reasons. If there is no other solution it may be fine, but I really want to avoid every RNA set function using atomics.

I think in general in our API design we should aim to make it safe for multiple threads to modify different datablocks, but we can't make such guarantees in the general case for multiple threads modifying properties of a single datablock. And the animation/driver system should be compatible with the general case, where the RNA set function may be user defined and modify multiple variables.

@Brecht Van Lommel (brecht) issue with bitflags in RNA is that from RNA side they look like different properties, but they actually affect the exact same value. So idea would be that default RNA setter would set those bitflags with atomic operations yes...

Here is a very quick and dirty first attempt, does not seem to work really though... Just for sake of doc:

1diff --git a/intern/atomic/atomic_ops.h b/intern/atomic/atomic_ops.h
2index 106e19567da..82d609f8a56 100644
3--- a/intern/atomic/atomic_ops.h
4+++ b/intern/atomic/atomic_ops.h
5@@ -99,6 +99,9 @@ ATOMIC_INLINE int32_t atomic_fetch_and_add_int32(int32_t *p, int32_t x);
6 ATOMIC_INLINE int32_t atomic_fetch_and_or_int32(int32_t *p, int32_t x);
7 ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x);
8
9+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t x);
10+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t x);
11+
12 ATOMIC_INLINE uint8_t atomic_fetch_and_or_uint8(uint8_t *p, uint8_t b);
13 ATOMIC_INLINE uint8_t atomic_fetch_and_and_uint8(uint8_t *p, uint8_t b);
14
15diff --git a/intern/atomic/intern/atomic_ops_msvc.h b/intern/atomic/intern/atomic_ops_msvc.h
16index d9655defa81..af08802cea1 100644
17--- a/intern/atomic/intern/atomic_ops_msvc.h
18+++ b/intern/atomic/intern/atomic_ops_msvc.h
19@@ -164,6 +164,30 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x)
20 return InterlockedAnd((long *)p, x);
21 }
22
23+/******************************************************************************/
24+/* 16-bit operations. */
25+
26+/* Signed */
27+#pragma intrinsic(_InterlockedAnd16)
28+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b)
29+{
30+#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8)
31+ return InterlockedAnd16((short *)p, (short)b);
32+#else
33+ return _InterlockedAnd16((short *)p, (short)b);
34+#endif
35+}
36+
37+#pragma intrinsic(_InterlockedOr16)
38+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b)
39+{
40+#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8)
41+ return InterlockedOr16((short *)p, (short)b);
42+#else
43+ return _InterlockedOr16((short *)p, (short)b);
44+#endif
45+}
46+
47 /******************************************************************************/
48 /* 8-bit operations. */
49
50diff --git a/intern/atomic/intern/atomic_ops_unix.h b/intern/atomic/intern/atomic_ops_unix.h
51index e1126cab0c2..17f5d7f98c1 100644
52--- a/intern/atomic/intern/atomic_ops_unix.h
53+++ b/intern/atomic/intern/atomic_ops_unix.h
54@@ -317,6 +317,23 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x)
55 # error "Missing implementation for 32-bit atomic operations"
56 #endif
57
58+/******************************************************************************/
59+/* 16-bit operations. */
60+#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_2))
61+/* Signed */
62+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b)
63+{
64+ return __sync_fetch_and_and(p, b);
65+}
66+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b)
67+{
68+ return __sync_fetch_and_or(p, b);
69+}
70+
71+#else
72+# error "Missing implementation for 16-bit atomic operations"
73+#endif
74+
75 /******************************************************************************/
76 /* 8-bit operations. */
77 #if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_1))
78diff --git a/source/blender/makesrna/intern/makesrna.c b/source/blender/makesrna/intern/makesrna.c
79index 2f5d9ae7a50..a0e44670753 100644
80--- a/source/blender/makesrna/intern/makesrna.c
81+++ b/source/blender/makesrna/intern/makesrna.c
82@@ -974,6 +974,24 @@ static void rna_clamp_value(FILE *f, PropertyRNA *prop, int array)
83 }
84 }
85
86+static void rna_def_property_bitflag_default_set_func(FILE *f,
87+ PropertyDefRNA *dp,
88+ const char *atomic_or_name,
89+ const char *atomic_and_name)
90+{
91+ fprintf(f,
92+ " if (%svalues[i]) %s(&(data->%s), (%du << i));\n",
93+ (dp->booleannegative) ? "!" : "",
94+ atomic_or_name,
95+ dp->dnaname,
96+ dp->booleanbit);
97+ fprintf(f,
98+ " else %s(&(data->%s), ~(%du << i));\n",
99+ atomic_and_name,
100+ dp->dnaname,
101+ dp->booleanbit);
102+}
103+
104 static char *rna_def_property_set_func(
105 FILE *f, StructRNA *srna, PropertyRNA *prop, PropertyDefRNA *dp, const char *manualfunc)
106 {
107@@ -1110,12 +1128,20 @@ static char *rna_def_property_set_func(
108
109 if (dp->dnaarraylength == 1) {
110 if (prop->type == PROP_BOOLEAN && dp->booleanbit) {
111- fprintf(f,
112- " if (%svalues[i]) data->%s |= (%du << i);\n",
113- (dp->booleannegative) ? "!" : "",
114- dp->dnaname,
115- dp->booleanbit);
116- fprintf(f, " else data->%s &= ~(%du << i);\n", dp->dnaname, dp->booleanbit);
117+ if (dp->dnatype) {
118+ if (STREQ(dp->dnatype, "char")) {
119+ rna_def_property_bitflag_default_set_func(
120+ f, dp, "atomic_fetch_and_or_int8", "atomic_fetch_and_and_int8");
121+ }
122+ if (STREQ(dp->dnatype, "short")) {
123+ rna_def_property_bitflag_default_set_func(
124+ f, dp, "atomic_fetch_and_or_int16", "atomic_fetch_and_and_int16");
125+ }
126+ if (STREQ(dp->dnatype, "int")) {
127+ rna_def_property_bitflag_default_set_func(
128+ f, dp, "atomic_fetch_and_or_int32", "atomic_fetch_and_and_int32");
129+ }
130+ }
131 }
132 else {
133 fprintf(
134@@ -4302,6 +4328,8 @@ static void rna_generate(BlenderRNA *brna, FILE *f, const char *filename, const
135
136 fprintf(f, "#include \"MEM_guardedalloc.h\"\n\n");
137
138+ fprintf(f, "#include \"atomic_ops.h\"\n\n");
139+
140 fprintf(f, "#include \"DNA_ID.h\"\n");
141 fprintf(f, "#include \"DNA_scene_types.h\"\n");
142

I extended my comment above a bit, to explain why I think this is the wrong level to add thread safety.

Not sure performances is a real issue here (atomics are slower, yes, but RNA also adds a fair amount of overhead, and all in all, I think I’d expect being able to evaluate tens of drivers/animation curves in parallel, even if those involve some atomic ops, would be far more efficient than evaluating all of them in a single thread, without atomic ops at all).

short/char are not an issue I think, there are atomic primitives for those as well, and DNA is aware of the type (see my patch attempt above).

But I actually agree with @Brecht Van Lommel (brecht) for his last point - we have no control over custom set functions. So yes, I guess enforcing depsgraph eval of all animdata in a single datablock to happen in a single thread is the only generic safe way to go. But that will have a cost over performances I think, especially in e.g. production armatures, which can have thousands of drivers/anim curves on a single ID. :/

I expect it can improve performance slightly in production (but not enough to really matter). There is overhead to having more depsgraph nodes and relations. The only way you would have that many drivers on an ID in practice is bones, and we can group those together (and also have to for drivers to work between bones at all).

Please stay away from atomics in RNA. There is indeed some overhead in RNA system, but it does not affect how scalable code is with the number of threads. Once you add atomic the scalability goes to nothing.

Please also stop considering atomics a magic solution. In many places of Blender they are used wrong, killing scabaility.

There might be an open report about same issue, but I can not find quickly find it. However, similar issue was solved for drivers/animation on an array properties (such as location). It was similar type of an error where multiple threads tried to write individual array elements, which isn't supported by RNA (RNA writes an entire array). The code which deals with this can be found in DepsgraphRelationBuilder::build_animdata_drivers(). Keep this in mind.

Unfortunately, Brecht's simple idea can't be implemented as-is: we allow interleaved driver evaluation, where you can drive one custom property with another and have as many IDs involved as one which, and the can of dependencies can be as long as you wish. In practice this means you need to be careful of what you are grouping with what and in which order.

Proposed solution:

  • Force same RNA Struct properties to be evaluated in a single thread.
  • Use depsgraph relations to ensure order (and hence, single-threadness of evaluation).
  • Use Brecht's idea of single-threading same-datablock-properties as a base, but with extra trickery: force order of drivers which drive property of the same StructRNA unless there is a transitive dependency between them already.

Some implementation notes:

  • We have some base for an "animation property cache" in DepsgraphBuilder::cache_, DepsgraphBuilderCache. I would imagine that to implement the solution we would need to resolve quite a bit of RNA paths, potentially more than once. Need to benefit from the cache as much as possible.
  • Adding extra relations is technically introducing evaluation overhead, but we are not by any means perfect in the number of relations already. Unless there will be a measurable slowdown I will not worry about extra relations added for this fix.
  • Those extra relations can be addressed in the future by giving dependency graph a "cleanup" pass: remove all transitive relations, remove all duplicated relations. And, possibly, group sequential nodes into a single evaluation node.
  • Dependency graph evaluation is written in a way that if there is a linear nodes setup like A -> B -> C then evaluation happens from a "hot" thread: no re-scheduling is done, the active thread immediately jumps from node A to B to C. In practice this means that for "simple" cases those extra relations are likely to not cause any measurable overhead.
  • In order to have all transitive relations properly taken into account the make-drivers-singlethread relations must be added at the end of the relations builder, once all other relations are known. This is not a problem from what builder supports (this is already how build_copy_on_write_relations is done). Just something to keep in mind when implementing fix for this issue.