Page MenuHome

Cycles: Split BVH nodes storage into inner and leaf nodes
ClosedPublic

Authored by Sergey Sharybin (sergey) on Apr 13 2015, 8:10 PM.

Details

Summary

This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.

This translates into following memory save using 01.01.01.G rendered
without hair:

Device memory size   Device memory peak   Global memory peak

Before the patch: 4957 5051 7668
With the patch: 4467 4562 7332

The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.

Diff Detail

Repository
rB Blender

Event Timeline

Sergey Sharybin (sergey) retitled this revision from to Cycles: Split BVH nodes storage into inner and leaf nodes.Apr 13 2015, 8:10 PM
Sergey Sharybin (sergey) updated this object.
Sergey Sharybin (sergey) updated this revision to Diff 3971.

Run speed tests. This change actually solves speed regression reported in T44337 and also, with barcelona file it gives around percent of speedup as well.

Will apply this patch to the memory branch first, and then once it's tested in the studio patches form that branch will go to master.

Fixes for dynamic viewport BVH and sm_2x cuda kernels

This revision was automatically updated to reflect the committed changes.