Targeting AVX2 and full SIMD width,
Performance measured with up to 10-15% improvements on complex scenes.
Summer School/KapiWow <email@example.com>, thanks.
I found few bugs - see comments.
Wrong size of array: float8 data;
Bad count of num_unaligned_nodes: BVH_STAT_UNALIGNED_INNER_QNODE_COUNT -> BVH_STAT_UNALIGNED_INNER_ONODE_COUNT + missing implementation of BVH_STAT_UNALIGNED_INNER_ONODE_COUNT
Bad index: c is only int4.
Updating to the latest local patches
Hopefully @Max (maxim_d33) will help with memory footprint
@Milan Jaros (jar091), since the code is in branch now, maybe it's easier
if you apply fixes directly in there?
Original BVH8 patch from Anton Gavrikov.
batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
fine for the branch and WIP status
but would need to be adjusted later, upon next opportunity
my simple(bmw)/complex(victor) scene numbers show the following memory footprints:
Fra:1 Mem:62.75M (0.00M, Peak 401.95M) | Time:01:47.93 | Sce: Scene Ve:0 Fa:0 La:0
Fra:1 Mem:62.75M (0.00M, Peak 404.44M) | Time:01:39.81 | Sce: Scene Ve:0 Fa:0 La:0
Fra:1 Mem:948.63M (0.00M, Peak 13459.96M) | Time:16:02.12 | Sce: Scene Ve:0 Fa:0 La:0
Fra:1 Mem:780.46M (0.00M, Peak 14095.58M) | Time:14:22.89 | Sce: Scene Ve:0 Fa:0 La:0
so around 4.7% mem. increase in Peak and only for complex scene.
any other simple/heavy scene to consider ?