Page MenuHome

Add OptiX device implementation to Cycles
ClosedPublic

Authored by Patrick Mours (pmoursnv) on Jul 29 2019, 2:49 PM.
Tokens
"Love" token, awarded by intracube."Y So Serious" token, awarded by lordodin."Love" token, awarded by bnzs."Cup of Joe" token, awarded by tomitz."Like" token, awarded by szc001."100" token, awarded by mistaed."100" token, awarded by MarioKart7z."Like" token, awarded by FreakNoizer."Love" token, awarded by monio."100" token, awarded by Bamarin."Love" token, awarded by helloidonthaveanyideaformyusername."Love" token, awarded by TheAngerSpecialist."Love" token, awarded by CMC."Love" token, awarded by slumber."Orange Medal" token, awarded by Zino."Love" token, awarded by symstract."Burninate" token, awarded by EitanSomething."Party Time" token, awarded by Way."Love" token, awarded by dgsantana."The World Burns" token, awarded by filibis."Burninate" token, awarded by jtheninja."Love" token, awarded by antoniov."Burninate" token, awarded by zanqdo."Love" token, awarded by juang3d."Love" token, awarded by duarteframos."Love" token, awarded by billreynish.

Details

Summary

This patch adds a new device implementation to Cycles which uses NVIDIA OptiX to support hardware accelerated ray tracing on NVIDIA RTX GPUs.
The diff also includes a few other optimizations tailored towards OptiX:

  • The kernel code was rearranged in some places to avoid conditional trace calls (by evaluating the condition separate from the trace itself), since currently those tend to hurt code generation in OptiX. This should not affect other APIs negatively.
  • It also has some variables initialized to zero to help the compiler, as it generated more spills otherwise.
  • Most functions were inlined to avoid call overhead in OptiX (but were left as before for other APIs).
  • The SVM decode_node_uchar4 function was split up, since doing so caused the compiler to generate better code.
  • The BVH class was modified slightly to allow it to access the device class. OptiX acceleration structure building requires access to the OptiX context, so it must be piped through the device.
  • An old CUDA workaround was removed in scene_intersect, since I couldn't reproduce a bug here and it caused worse code generation.
  • Some missing kernel feature checks were added as it failed to compile with those disabled.

To build:

  1. You first need to install the CUDA 10.1 toolkit and OptiX SDK 7.0 to the default location (or set the OPTIX_INCLUDE_DIR CMake variable manually in the following steps). The download page may require you to register a developer account at NVIDIA.
  2. Set up a Blender build environment as usual but download and apply this patch to the Git repository (Download Raw Diff on the right via Save Link As and then run git apply D5363.diff with the downloaded file in your local repository after syncing to latest master branch).
  3. Continue following the usual Blender build instructions (run make full). It will build with OptiX by default (see also the WITH_CYCLES_DEVICE_OPTIX CMake variable).

To execute:

  1. Download and install the NVIDIA Driver 435 Pre-Release (you'll need that specific version for now). Again, the download page may require you to register a developer account.
  2. Launch the Blender executable you built previously and head over to the preferences (Edit > Preferences > System > Cycles Render Devices) to enable OptiX. Then select one or more of the supported GPUs in your system in the list below that option (only RTX GPUs are supported at this point and they will only show up with the driver from above installed).
  3. Everything is now set up. Next time you start a render in Cycles, and you have a supported GPU, it will be accelerated using OptiX (provided that the scene has the Device type set to GPU Compute in its render settings). Note: The very first start may take a minute or two longer, since OptiX must compile and cache the render kernels once.

In case you run into issues, starting Blender with --debug-cycles will turn on log output from OptiX which can help troubleshooting.

Diff Detail

Repository
rB Blender

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
intern/cycles/blender/blender_session.cpp
197 ↗(On Diff #16880)

Use parentheses around comparisons. Also other places.

intern/cycles/device/device.h
410

Maybe call this build_bvh instead?

intern/cycles/device/device_optix.cpp
1953

Could an override for this be added to the debug panel? That would make it easy to test / debug the Optix code paths with non RTX cards without needing to patch over these lines.

intern/cycles/kernel/geom/geom_subd_triangle.h
102 ↗(On Diff #16880)

Is there a reason this change isn't an ifdef like all the other cases?

intern/cycles/kernel/kernel_path.h
366 ↗(On Diff #16880)

Use braces even for single lines. Other places as well.

intern/cycles/kernel/kernels/optix/kernel_optix.cu
26 ↗(On Diff #16880)

Maybe use snake_case instead of camelCase for these functions? These aren't part of the Optix API yet look like they are, I think it'd be better if it were more apparent that these are defined in Cycles by keeping consistent with that style.

This revision now requires changes to proceed.Aug 8 2019, 1:50 PM

Fixes some of the formatting issues that were brought up. The #ifdefs in geom_subd_triangle.h were missing on accident.

I'd prefer using a different solution than all these #ifdefs around noinline functions too. I didn't want to redefine the ccl_device_noinline macro to force inline though, since that would only cause confusion. Maybe we could instead introduce macros like ccl_device_prefer_noinline (which would inline on OptiX) and ccl_device_force_noinline (which would noinline everywhere)?

I checked the changes to kernel_branched_path_surface_connect_light against lots of scenes and couldn't identify any differences, so they should check out. More eyes always help though.

Adding a debug flag to force listing of all devices under OptiX would be nice, but it seems the DebugFlags() mechanism is currently evaluated after devices are listed, so they don't have an effect on it.

As for merging CUDA/OptiX device selection: In the long run this is probably a good idea. For now though the OptiX device comes with some limitations that prevent this (e.g. you cannot select a CPU device with it, since that would require multiple BVH layouts to be built which Cycles cannot do currently).

Patrick Mours (pmoursnv) marked 4 inline comments as done.Aug 8 2019, 3:05 PM
Alex Fuller (mistaed) requested changes to this revision.Aug 8 2019, 9:16 PM

Hi,

I had this posted a day or two ago but I didn't understand how phabricator's differential works. I've made a basic inline comment in bvh_embree.h in which the new function needs fixing to compile properly:

virtual void upload(Progress &progress, DeviceScene *dscene) override;

Cheers, and great work!

intern/cycles/bvh/bvh_embree.h
39 ↗(On Diff #16880)

Hey! I am compiling GafferCycles https://github.com/boberfly/GafferCycles with Embree, and I had a compile error. This line should be:

virtual void upload(Progress &progress, DeviceScene *dscene) override;

Cheers!

This revision now requires changes to proceed.Aug 8 2019, 9:16 PM

 - Fixed a compile error when building with Embree (thanks @Alex Fuller (mistaed))

  • Fixed an OptiX warning during pipeline creation
  • Reduced number of used attributes in OptiX pipeline to two
Patrick Mours (pmoursnv) marked an inline comment as done.Aug 12 2019, 6:15 PM

Fixed performance regression that caused BMW benchmark scene to only be 1.1x faster instead of 1.6x

Just thinking aloud, this might be an opportunity for a small refactor of cycles: currently, Cycles builds the BVH before it uploads geometry vertices to the device. Most ray tracing APIs (DXR, OptiX, Embree) work the other way round. Before we introduce an OptiX specific workaround, should we change Cycles to do vertices first, BVH second? That would also help the Embree backend and potentially a future DXR/Vulkan backend when we see more vendors deliver ray tracing hardware.

Did a simple test to see if branched path tracing is affected by this, and in fact it is, as can be seen in these images. Note that this test was CPU only, so this is a regression and not limited to the new device backend. The branched functions will need to be corrected, I haven't looked too deeply at it yet, but it looks like some lines from the original are missing from the new versions of these functions. I'll do a more thorough look thru and testing later.

master:


diff:

Test scene:

Fixed modified branched connect to light functions

They were missing a check for mesh lights. Seems to produce the correct result in that test scene now when run with CPU.

Also improved performance slightly by fixing some unnecessary spills around trace calls again + some noinline specifiers.

The instructions reference the pre-release drivers.

Can the drivers just generally released (436.02) be used with this patch?

Cleaned up some unnecessary changes

Also checked against OpenCL again since there were some issues there with the initial patch. Seems to work correctly as before there now.

Rebase against master and fix merge conflicts

Fixed some missing changes to "decode_node_uchar4" calls after rebase

I normally try to refrain polluting diff reviews with "works for me!" but I think this patch fixes/compensates for what I see as a rendering deficiency in Cycles when it comes to motion blur.

Basically, tiles with motion blur render more than a factor faster than with CUDA + CPU rendering. An example of this is a specific 720p rendered frame where the rendering always appeared to get "stuck" on tiles with motion blur. With this patch, I witnessed a 26X rendering speed increase (10m -> 22s) to complete a full frame. The hardware used for the rendering is the 2080 RTX + i9 9900k.

Just thinking aloud, this might be an opportunity for a small refactor of cycles: currently, Cycles builds the BVH before it uploads geometry vertices to the device. Most ray tracing APIs (DXR, OptiX, Embree) work the other way round. Before we introduce an OptiX specific workaround, should we change Cycles to do vertices first, BVH second? That would also help the Embree backend and potentially a future DXR/Vulkan backend when we see more vendors deliver ray tracing hardware.

Sounds good to me.

I committed part of the kernel changes now, mainly those that potentially affect CPU/CUDA/OpenCL. This will cause some merge conflicts, I can rebase this patch if needed.

The overall effect of the committed changes is maybe 1-2% faster CUDA rendering in the benchmarks.

The ambient occlusion changes have not been committed, these cause a 3-4% performance regression here, I'll find a solution for that.

Rebase on latest master.

Rebase on latest master.

Rebase on latest master.

At this point most of the code that could affect devices other than Optix has been committed. That gets us close to being able to commit the remainder as an experimental option to master, where can continue to make this stable and feature complete.

Still to be looked at:

  • The ambient occlusion changes cause a performance regression for non-Optix devices, we can tweak the code so the conditionals are added back except for Optix.
  • For building it would be convenient if we could include optix_stubs.h and other headers it needs in extern/optix/include, so the SDK does not need to be installed. But we may need some license changes to the header.
  • The preferences need to display a warning to indicate that Optix is still experimental and that not all features are supported.
  • Ideally unsupported features would also be hidden or greyed out in the user interface (mainly branched path and baking?)
  • We need to create a task on developer.blender.org with remaining work, overview of which features are supported, etc.

Did another run of performance tests and it looks like the AO change is performing worse in some scenes under OptiX now too and does no longer give a noticeable speedup in others compared to when I first implemented it (on an older OptiX version). So it's likely safe to skip it and leave things how they currently are.

 - Revert changes to AO code

  • Add missing "ccl_device_noinline_cpu" definition for OptiX
  • Hide branched path tracing and baking options in UI when OptiX is active

Added warning message to OptiX device selection and fixed an invalid assert on the GPU

I get this error:

D:\blender-git\blender> git apply D5363.diff
error: patch failed: intern/cycles/util/util_math.h:619
error: intern/cycles/util/util_math.h: patch does not apply

I get this error:
D:\blender-git\blender> git apply D5363.diff
error: patch failed: intern/cycles/util/util_math.h:619
error: intern/cycles/util/util_math.h: patch does not apply

Same here, impossible to apply the diff...

I plan to commit this tomorrow.

This is enabled by default for release and buildbot builds. For own builds,
instructions are here:
https://wiki.blender.org/wiki/Building_Blender/CUDA#Optix

This revision was not accepted when it landed; it landed in state Needs Review.Sep 13 2019, 11:55 AM
This revision was automatically updated to reflect the committed changes.