Experimental hashing lookup of ID's
Changes PlannedPublic

Authored by Campbell Barton (campbellbarton) on Nov 6 2017, 9:04 AM.

Details

Summary

This patch uses a GSet for storing ID's, using (name, library_pointer) for the identifiers.

The goal is to speedup adding/duplicaing many objects, avoiding full string lookups.

Notes:

  • When the GSet is not NULL, its expected to be valid and it's used for lookups.
  • When the GSet is NULL, regular ListBase functions are used as a fallback.
  • The GSet should only be NULL in the middle of file loading (including undo), since keeping all pointers valid in this case seems unnecessarily complicated for no real gain.

Possible Improvements:

  • To avoid re-allocating the GSet's on undo we could clear it for re-use, storing a boolean for used/clear separately - so we know when it can't be used.
  • We could store the number of items of each ID type - to avoid resizing buckets when creating the hashes.
  • Currently the lookup function needs to make an ID on the stack each time - it might be more efficient to pass this in - especially when there are many collisions (just a guess, but worth trying).

While the patch is quite complete - this is a first pass to check if this is a good approach.
Before committing to master there are further changes to check on.

  • Use for all Python lookups bpy.data.objects[key], not just key, lib.
  • Currently the RNA side does a single loop through the RNA collection to get the ID type -- this should be handled more cleanly.
  • This doesn't include the optimization from D2909 - they can still be applied for the non-gset case, however this isn't that important for reviewing this patch.
  • BKE_libblock_rename and BLI_libblock_ensure_unique_name could be merged into a single function, although this is more an issue of refactoring in general.

Some benchmarks:

Using different names (no collisions)

./blender -b --python-expr "for i in range(50000 if locals().update(dict(bpy=__import__('bpy'))) is None else None): bpy.data.actions.new(hex(hash(i)))"
  • before 23.42s
  • after 12.55

Using the same name (all collisions)

./blender_old -b --python-expr "for i in range((50000, locals().update(dict(bpy=__import__('bpy'))))[0]): bpy.data.actions.new('Name')"
  • before : 90.34s
  • after 89.67s.

Note that most time is spent counting duplicates.

Diff Detail

Repository
rB Blender
Branch
TEMP-ID-GSET
Build Status
Buildable 971
Build 971: arc lint + arc unit
Campbell Barton (campbellbarton) edited the summary of this revision. (Show Details)
  • Remove library from hash
Campbell Barton (campbellbarton) edited the summary of this revision. (Show Details)
Campbell Barton (campbellbarton) edited the summary of this revision. (Show Details)
Campbell Barton (campbellbarton) planned changes to this revision.EditedDec 6 2017, 3:32 PM

Reviewing own code here :)

The main problems with current naming lookups is creating many objects at once (either through Python or just duplicating many objects).

Even though faster lookups on bpy.data is nice, it's not great practice to do this.

This patch is more involved then I'd like, and doesn't _really_ help that much with performance of creating many objects at once.

I think a better solution might be to have a way to create many ID's at once, with an optimal way of avoiding so many lookups.