initial BVH8 implementation
AbandonedPublic

Authored by Sergey Sharybin (sergey) on Oct 9 2017, 9:55 AM.

Details

Summary

Targeting AVX2 and full SIMD width,
Performance measured with up to 10-15% improvements on complex scenes.

Summer School/KapiWow <5158887@mail.ru>, thanks.

Diff Detail

Repository
rB Blender
Branch
cycles_bvh8
Build Status
Buildable 1290
Build 1290: arc lint + arc unit
Max (maxim_d33) created this revision.Oct 9 2017, 9:55 AM

Very interesting! Here's some tests on i7-4790K, Linux, GCC 7.2, BVH build time excluded.

It's faster though only a few %. Maybe it depends on the CPU, compiler or tests scenes?

I found few bugs - see comments.

intern/cycles/bvh/bvh8.cpp
190

Wrong size of array: float8 data[14];

259

Bad count of num_unaligned_nodes: BVH_STAT_UNALIGNED_INNER_QNODE_COUNT -> BVH_STAT_UNALIGNED_INNER_ONODE_COUNT + missing implementation of BVH_STAT_UNALIGNED_INNER_ONODE_COUNT

479

Bad index: c is only int4.

Updating to the latest local patches

  • Updated against latest master
  • Actually based on cycles_bvh8 branch

Motivation points.

  1. The work is done, and it has (small, but has) speedup.

    For the time until Embree based implementation is fully ready and such, do not see anything against of doing smaller optimization. Especially since all the work is basically done.
  1. Could be rather interesting basegrouond for wider BVH and bucketed intersection on non-CPU, where Embree is not applicable.
  1. Always nice to have various implementations ready, for comparisons and such.

Hopefully @Max (maxim_d33) will help with memory footprint
comparison.

@Milan Jaros (jar091), since the code is in branch now, maybe it's easier
if you apply fixes directly in there?

Original BVH8 patch from Anton Gavrikov.
batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.

fine for the branch and WIP status
but would need to be adjusted later, upon next opportunity

my simple(bmw)/complex(victor) scene numbers show the following memory footprints:
bmw27_cpu.blend
BVH4
Fra:1 Mem:62.75M (0.00M, Peak 401.95M) | Time:01:47.93 | Sce: Scene Ve:0 Fa:0 La:0
Saved: '/tmp/0001.png'

BVH8
Fra:1 Mem:62.75M (0.00M, Peak 404.44M) | Time:01:39.81 | Sce: Scene Ve:0 Fa:0 La:0
Saved: '/tmp/0001.png'

victor.blend
BVH4
Fra:1 Mem:948.63M (0.00M, Peak 13459.96M) | Time:16:02.12 | Sce: Scene Ve:0 Fa:0 La:0
Saved: '/tmp/0001.png'

BVH8
Fra:1 Mem:780.46M (0.00M, Peak 14095.58M) | Time:14:22.89 | Sce: Scene Ve:0 Fa:0 La:0
Saved: '/tmp/0001.png'

so around 4.7% mem. increase in Peak and only for complex scene.

any other simple/heavy scene to consider ?

Sergey Sharybin (sergey) abandoned this revision.

This patch is in master now. Closing (think i have to commandeer it for this.. Eh, phabricator...)