Page MenuHome

Add aligned malloc to guardedalloc
Closed, InvalidPublic


Scenario - recent additions have started adding local variations for aligned memory allocation calls. (mainly libmv)

Solution - add aligned memory allocation to intern/guardedalloc. Ideally with the benefits added by existing guardedalloc routines.

The attached patch looks at one approach to that solution by merging aligned allocation into the existing guardedalloc structure.

Cost -
Memory usage - adds a short to MemHead struct and varying amounts of padding to meet alignment requirements.

Speed - A couple of calculations are added to getting the size of MemHead to account for it's size being rounded to align boundaries so that the user data starts at memh+mem_sizeof_MemHead(align) to meet alignment requests. This affects make_memhead_header which is used by all allocations.

It also adds some overhead locating the MemHead for a block to compensate for MemHead being aligned and not exact sizeof(MemHead). This mainly affects MEM_allocN_len, MEM_dupallocN, MEM_reallocN and MEM_freeN. Most cases would be minimal but would increase with blocks using larger units of alignment.

As I have implemented, all allocations are aligned on a power of 2 boundary with a minimum of 2 (debate on the best minimum can come later) for FreeBSD this is increased to sizeof(void*) to meet posix_memalign requirements. Also OSX aligns all allocations to 16 byte boundaries (except valloc which aligns to page boundaries) - so specific alignment requests are ignored. I think 4,8,16 would probably be the main alignment requests and would keep alignments to power of 2, but I don't know if other platforms have alignment limits or requirements.

As memory allocation can affect performance this may or may not be a worthwhile trade off. In which case an alternative is to have a group of aligned guardedalloc functions which tracks the aligned blocks in a separate list, leaving existing allocations as they are. The biggest effect there would be ensuring aligned_free is used for each aligned_malloc.

Platform variations I have used are based on code within libmv/tracking

Testing - At the top of mallocn.c I have added TEST_ALIGNED_MALLOC to alter MEM_mallocN and MEM_callocN to call MEM_aligned_mallocN instead of normal malloc/calloc calls to test the theory without affecting the entire codebase. I also ran it with MEM_MIN_ALIGN_SIZE set to 128 to try and show any areas that may make assumptions about MemHead placement etc.

With that set I have made a small animation with two particle sets - one halo one object, without any issues.



Event Timeline

Although I see the issue, i fail to see why so much code is needed for it. Proper alignment better gets handled in malloc itself, on OS level or via system calls.

The excess is to find the memhead before the known address.

It can be simplified by specifying that all alignments are on say 8 byte boundaries. We can then pad the memhead so it is always a specific byte count before the known address.

The other option is to add a simple aligned allocation call that isn't guarded but provides a single entry point blender wide to choose the correct system call.

Second patch is a simple addition of an aligned allocation function and matching free. There is no guarded alloc in this - it is simply a central place to choose appropriate system calls used.

Shane Ambler (sambler) closed this task as Invalid.Nov 15 2014, 4:23 AM
Shane Ambler (sambler) claimed this task.