Improve name conflict handling in ID management #73412

New Issue

Bastien Montagne · 2020-01-26T18:21:45+01:00

Bastien Montagne commented

2020-01-26 18:21:45 +01:00

Commits d840658078, 2aab727009, 4cc8201a65, and 46607bc09d already made things significantly faster here, at least when dealing with large amounts of data-blocks.

Going further will require caching info about already used names (more precisely, used base names, and used suffix numbers).

Proposal

Use a GHash per ID type, stored in Main, using base names as keys, and storing as values a structure to help quickly finding the best available number.

Using only base names as keys will seriously reduce the memory usage of the whole cache, in case of massive amounts of IDs with same base name. And avoid lots of trial & failure while searching for an available number suffix.

Those hashes will have to be kept up-to-date all the time, as building them on demand would require way too much computations. This should not be too difficult though, code affecting ID's names is small and well confined. ID deletion code shall also update that cache.

Regarding Numbering

Current approach is to ensure we always get the smallest available number up to a specific value (around 1K in current code), then we just use smallest value after last one used (i.e. do not re-use freed numbers, so if e.g. we have Object to Object.2000 fully used by 2k objects, and delete Object.1500, then add a new one, it will get Object.2001 as name).

We can probably efficiently do that for an arbitrary amount of numbers by storing bitflags (using chunks, maybe single 32bits integers, and a mempool) to tag used numbers (that would add about one extra bit of memory per ID). Using a n-tree structure should help with quickly finding an available number then (by greatly reducing the amount of comparisons needed to find the smallest available one).

Commits d840658078, 2aab727009, 4cc8201a65, and 46607bc09d already made things significantly faster here, at least when dealing with large amounts of data-blocks. Going further will require caching info about already used names (more precisely, used base names, and used suffix numbers). ## Proposal Use a GHash per ID type, stored in `Main`, using base names as keys, and storing as values a structure to help quickly finding the best available number. Using only base names as keys will seriously reduce the memory usage of the whole cache, in case of massive amounts of IDs with same base name. And avoid lots of trial & failure while searching for an available number suffix. Those hashes will have to be kept up-to-date all the time, as building them on demand would require way too much computations. This should not be too difficult though, code affecting ID's names is small and well confined. ID deletion code shall also update that cache. ### Regarding Numbering Current approach is to ensure we always get the smallest available number up to a specific value (around 1K in current code), then we just use smallest value after last one used (i.e. do not re-use freed numbers, so if e.g. we have `Object` to `Object.2000` fully used by 2k objects, and delete `Object.1500`, then add a new one, it will get `Object.2001` as name). We can probably efficiently do that for an arbitrary amount of numbers by storing bitflags (using chunks, maybe single 32bits integers, and a mempool) to tag used numbers (that would add about one extra bit of memory per ID). Using a n-tree structure should help with quickly finding an available number then (by greatly reducing the amount of comparisons needed to find the smallest available one).

Bastien Montagne commented

2020-01-26 18:21:45 +01:00

Changed status from 'Needs Triage' to: 'Confirmed'

Bastien Montagne commented

2020-01-26 18:21:45 +01:00

Added subscriber: @mont29

blender-admin commented

2020-01-26 18:21:45 +01:00

#90052 was marked as duplicate of this issue

Dion Moult commented

2020-03-06 06:22:58 +01:00

Added subscriber: @Moult

Bastien Montagne commented

2021-07-23 10:35:53 +02:00

Added subscribers: @marty3000, @PratikPB2123, @lichtwerk, @kursadk

Chris Kohl commented

2021-12-09 22:48:34 +01:00

Added subscriber: @ckohl_art

Aras Pranckevicius commented

2022-02-18 17:59:45 +01:00

Added subscriber: @aras_p

Aras Pranckevicius self-assigned this 2022-02-18 17:59:45 +01:00

Aras Pranckevicius commented

2022-02-18 17:59:45 +01:00

I'll try to do this, :fingers_crossed:

Raimund Klink commented

2022-02-19 22:44:43 +01:00

Added subscriber: @Raimund58

Daniel Gryningstjerna commented

2022-07-06 17:23:55 +02:00

Added subscriber: @Dangry

Aras Pranckevicius commented

2022-07-23 21:40:48 +02:00

Changed status from 'Confirmed' to: 'Resolved'

Aras Pranckevicius closed this issue

2022-07-23 21:40:48 +02:00

Aras Pranckevicius commented

2022-07-23 21:40:48 +02:00

Design as outlined here was implemented for Blender 3.3 in D14162. Possible follow up work: address "sort IDs by name" performance, perhaps by having functions that create a whole batch of IDs at once, and do the sorting in a more optimal way.

Design as outlined here was implemented for Blender 3.3 in [D14162](https://archive.blender.org/developer/D14162). Possible follow up work: address "sort IDs by name" performance, perhaps by having functions that create a whole batch of IDs at once, and do the sorting in a more optimal way.

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Improve name conflict handling in ID management #73412

Proposal

Regarding Numbering