Page MenuHome

Ensure IDs get unique memory addresses withing q given editing session.
AbandonedPublic

Authored by Bastien Montagne (mont29) on Feb 26 2020, 3:02 PM.

Details

Summary

Disclaimer: this should be considered as an engeneering, explanatory patch.

Rational: for undo work, we are now manipulating IDs from potentially many different 'memory realms' (in contrast to current undo system, where, since we always reallocate the whole main database at each undo step, we only ever have to deal with two, the old one from the memfile, and the new one from re-read data from that memfile).

So far, new undo system has used ID names-based search to remap old to new pointers, but this is slower, less safe, and forces us to do full undo barrier e.g. when renaming an ID.

Idea of this patch is instead to ensure that within a given editing session, we never ever allocate a data-block at the same address twice. That way, we can consider ID pointers as valid UIDs withing that scope, and use their values to handle remapping in udo steps even in case an ID has had several memory addresses in the history of the editing session.

This patches does two things:

  1. Add a way to ensure a newly allocated ID gets a memory address never used before for any ID.
  2. Store the history of all memory addresses ever used by a given ID.

Note that point 1 above is currently using a fairly simple and naive solution, but also quiet innefficient (just allocating again and again until we get a 'free' address), we can imagine a much more efficient solution if we go deeper in MEM_ allocator system itself. Not sure if it is worth it though, as with new undo system actual re-allocation of a same ID at a different address should be fairly rare (i.e. history of used/forbidden addresses should remain quiet short).

Diff Detail

Repository
rB Blender
Branch
id-ensure-unique-memory-address
Build Status
Buildable 6782
Build 6782: arc lint + arc unit

Event Timeline

I do not think that simply reallocating until an unique address is found is a viable option. This leads to quadratic runtime complexity in theory and practice.

This can easily be seen by running this script in Blender when the patch is applied:

import bpy

for i in range(10000):
    print(i)
    mesh = bpy.data.meshes.new("mesh")
    bpy.data.meshes.remove(mesh)

Depending on which underlying allocator is used, the same memory buffer might be reused when you call malloc soon after free.
I tested this on linux and it takes a couple thousands of iterations until the first collision, but then I started getting a lot (hundreds, if not thousands per ID allocation).

On Windows it is even worse. With just ten iterations I get this output:

0
1
Allocating ID re-used memory address 0000029AD3B00528, trying again (1)...
2
Allocating ID re-used memory address 0000029AD3B00528, trying again (1)...
Allocating ID re-used memory address 0000029AD3B25148, trying again (2)...
3
Allocating ID re-used memory address 0000029AD3B00528, trying again (1)...
Allocating ID re-used memory address 0000029AD3B25148, trying again (2)...
Allocating ID re-used memory address 0000029ABD01A648, trying again (3)...
4
5
6
Allocating ID re-used memory address 0000029ACA5CD238, trying again (1)...
7
Allocating ID re-used memory address 0000029ACA5CD238, trying again (1)...
8
Allocating ID re-used memory address 0000029ACA5CD888, trying again (1)...
9
Allocating ID re-used memory address 0000029ACA5CDED8, trying again (1)...
Allocating ID re-used memory address 0000029ACA5CE528, trying again (2)...
Allocating ID re-used memory address 0000029ACA5CCBE8, trying again (3)...
Allocating ID re-used memory address 0000029ACA5CD888, trying again (4)...
Allocating ID re-used memory address 0000029ACA5CD238, trying again (5)...

I was even able to reproduce the issue by deleting the default cube and adding it back multiple times.

I do not fully understand the problem you are trying to solve, but I wonder if unique addresses are really necessary or if another run-time ID (e.g. based on a simple counter) could work as well. Maybe you'd need a separate array that maps these IDs to pointers.

Agree that trying to ensure memory addresses are globally unique is fragile.

Can't we solve this by having a map between the datablock pointer and the pointer in undo memory? This would be updated when you save a datablock to undo memory, or when you create a new datablock from undo memory. That would also keep this localized to the undo code, and not affect ID allocation outside out of.

The whole point of this patch is to allow to keep the address-based identifying/remapping process all over our readfile code base. It has some limitations, but it is also a very easy, simple and efficient way of handling the problem.

I do not think that simply reallocating until an unique address is found is a viable option. This leads to quadratic runtime complexity in theory and practice.

As said in description, current algorithm is very simple and naive. We can imagine many ways to make it more robust (allocating a randomized size in-between IDs re-allocations e.g., or extending MEM_ allocator system itself to support some kind of extra buffering (e.g. 8 bytes) that could be used to return a different address in case of collision, without actually re-allocating that much, etc.)

Also, py scripts creating and deleting many temp data in main is very bad, for many other reasons than just that allocation issue. this will have to be addressed anyway, one day or another.

I do not fully understand the problem you are trying to solve, but I wonder if unique addresses are really necessary or if another run-time ID (e.g. based on a simple counter) could work as well. Maybe you'd need a separate array that maps these IDs to pointers.

So far, we have been using ID's memaddresses as uuid, which is very efficient, but also fairly risky and has already been a serious pain in the past in some cases. Am personally all fine with actually adding a real uuid system to IDs, that would make many things much, much simpler, and cost would not be very high (numeric mapping is not that expensive). But here are some very strong opinions against that in Blender core team afaik, and am not really willing to open that Pandora box again currently, hence the work-around with unique pointer addresses...

Can't we solve this by having a map between the datablock pointer and the pointer in undo memory? This would be updated when you save a datablock to undo memory, or when you create a new datablock from undo memory. That would also keep this localized to the undo code, and not affect ID allocation outside out of.

Short answer is no. Or at least, I cannot think of any practical way to do this. Undo/redo can create and delete several times many different IDs, which means that without this patch, you will end up with cases where different IDs have had same pointer at different history points, then undoing will not be able to tell which is which. Again, unlike current system where pointer collisions are impossible, with new one they will happen, even very often depending on the memory allocator.

I just cannot wrap my head around a working system using a mapping from old addresses to current valid ones, that could cope with those pointer collisions. Afaiks, for this to work we'd need to:

  • Store a mapping from 'old' to valid addresses in each undo step (this should not be too hard).
  • Propagate changes of 'valid' address value every time we end up re-allocating somehow a given ID, into every mapping of all existing undo steps. This I believe would be very hard to get working in a reliable way.

After discussing with @Bastien Montagne (mont29), we think storing a globally unique 64-bit UUID in each datablock can work instead of making the datablock pointer globally unique.

Indeed, so abandoning that for now.