Cycles: Embree integration
Needs ReviewPublic

Authored by Stefan Werner (swerner) on Mon, Sep 10, 1:59 PM.

Details

Summary

This patch adds Intel's Embree as an option for ray acceleration.

I'm not expecting this patch to be accepted right away, I'm publishing it here because I think it's very useful even when not 100% perfect.

Quick FAQ:

  • Will this work with GPUs? No.
  • Will this work with AMD CPUs? Yes.
  • Benchmark X renders slower than before. Is it broken? No. I don't expect gains for simple scenes. The benefits of this are mostly visible with big and complex scenes with heavy motion blur.
  • Will this work with GPUs? Still no.

To use this, a build of Embree 3.2.0 or newer is required with intersection filtering enabled. Note that Embree 3.2.0 has a known bug with ribbon curves that is expected to be fixed in newer releases (currently in alpha). The scripts for building dependencies are not updated to fulfil this dependency yet.

There is room for improvement.
Thick line segments have no native support in Embree. Vertex and index buffers could be shared with the rest of Cycles to reduce the memory footprint. Embree has native support for quad meshes, displacements and subdivisions which is not leveraged by this patch. The CPU split kernel could be modified to call Embree with a batch of rays for intersections.
One could probably implement traversal of Embree's BVH for GPUs. This will be probably be a longer discussion, we could abstract ray intersections in general to allow Cycles to use whatever backend is best for the hardware at hand - Embree on Intel CPUs, RTX on Nvidia GPUs and RadeonRays on AMD GPUs.

Nonetheless, an earlier version of this (based on Embree 2.x) was used by Tangent Animation for Next Gen, so I have faith that this patch is reasonably robust.

Diff Detail

Repository
rB Blender
Branch
cycles_embree
Build Status
Buildable 2029
Build 2029: arc lint + arc unit
This comment was removed by Stefan Werner (swerner).

Please disregard the excessive changes in platform_win32.cmake - that must have been a merge gone wrong.

Also add to the list of "known issues": Embree's object motion blur only does linear interpolation, not arc interpolation like Cycles' BVH. The blur of fast rotations will look different.

Stefan Werner (swerner) retitled this revision from Cycles: Laid some foundation for embree ray accelerator: flags, UI and CMake to Cycles: Embree integration.Mon, Sep 10, 2:37 PM
Stefan Werner (swerner) edited the summary of this revision. (Show Details)Mon, Sep 10, 2:40 PM
Stefan Werner (swerner) edited the summary of this revision. (Show Details)

Not sure having abstraction over ray intersection is really great idea: it is really preferable to preserve render result (at least ,result to which render converges to) for all supported devices.

For the complex scenes/motion blue memory usage and speed, wondering whether it is due to STBVH [1]. And in general, is it due to better quality of BVH itself, or there is really something really different is happening in the traversal (comparing with BVH8+bucketed triangles from Max/Via/Anton). Still think using Embree builder will make it a huge difference on GPU (there seems to be a bug in our OOBB/BVH split which was producing rather poor quality BVH, which i never had time to fully investigate).

Anyway, that is all technical, and maybe it'll be easier to talk about in person at the conference :) For now, think it's fair enough to ask for some numbers ;) Like, benchmark suite + something with moblur rendered with current master BVH8 and Embree. Reporting device/peak memory + render times.

Adding Maxym here as well.

[1] https://embree.github.io/papers/2017-HPG-msmblur.pdf

All BVHs are going to give identical results, unless they're broken. Likewise, the differences between separate implementations of a given ray/triangle intersector (Möller–Trumbore, Woop, ...) will also only differ in floating point rounding. As long as we're compiling with fast-math turned on, our results will not be bit-perfect across compilers anyway. Hair curves have small differences (caps, self-intersection epsilon), if we need perfection we could just implement the same intersection everywhere (Embree and DXR/OptiX allow for user defined geometry).

Regarding numbers:
Agent 327, 07_04_F.lighting.blend, frame 162. Dual Xeon 2660v2
No Embree: 16m03s, 7142MB
With Embree: 4m48s, 6161MB