Page MenuHome

Add OptiX device implementation to Cycles
Needs ReviewPublic

Authored by Patrick Mours (pmoursnv) on Mon, Jul 29, 2:49 PM.
Tags
Tokens
"Cup of Joe" token, awarded by tomitz."Like" token, awarded by szc001."100" token, awarded by mistaed."100" token, awarded by MarioKart7z."Like" token, awarded by FreakNoizer."Love" token, awarded by monio."100" token, awarded by Bamarin."Love" token, awarded by helloidonthaveanyideaformyusername."Love" token, awarded by TheAngerSpecialist."Love" token, awarded by CMC."Love" token, awarded by slumber."Orange Medal" token, awarded by Zino."Love" token, awarded by symstract."Burninate" token, awarded by EitanSomething."Party Time" token, awarded by Way."Love" token, awarded by dgsantana."The World Burns" token, awarded by filibis."Burninate" token, awarded by jtheninja."Love" token, awarded by antoniov."Burninate" token, awarded by zanqdo."Love" token, awarded by juang3d."Love" token, awarded by duarteframos."Love" token, awarded by billreynish.

Details

Summary

This patch adds a new device implementation to Cycles which uses NVIDIA OptiX to support hardware accelerated ray tracing on NVIDIA RTX GPUs.
The diff also includes a few other optimizations tailored towards OptiX:

  • The kernel code was rearranged in some places to avoid conditional trace calls (by evaluating the condition separate from the trace itself), since currently those tend to hurt code generation in OptiX. This should not affect other APIs negatively.
  • It also has some variables initialized to zero to help the compiler, as it generated more spills otherwise.
  • Most functions were inlined to avoid call overhead in OptiX (but were left as before for other APIs).
  • The SVM decode_node_uchar4 function was split up, since doing so caused the compiler to generate better code.
  • The BVH class was modified slightly to allow it to access the device class. OptiX acceleration structure building requires access to the OptiX context, so it must be piped through the device.
  • An old CUDA workaround was removed in scene_intersect, since I couldn't reproduce a bug here and it caused worse code generation.
  • Some missing kernel feature checks were added as it failed to compile with those disabled.

To build:

  1. You first need to install the CUDA 10.1 toolkit and OptiX SDK 7.0 to the default location (or set the OPTIX_INCLUDE_DIR CMake variable manually in the following steps). The download page may require you to register a developer account at NVIDIA.
  2. Set up a Blender build environment as usual but download and apply this patch to the Git repository (Download Raw Diff on the right via Save Link As and then run git apply D5363.diff with the downloaded file in your local repository after syncing to latest master branch).
  3. Continue following the usual Blender build instructions (run make full). It will build with OptiX by default (see also the WITH_CYCLES_DEVICE_OPTIX CMake variable).

To execute:

  1. Download and install the NVIDIA Driver 435 Pre-Release (you'll need that specific version for now). Again, the download page may require you to register a developer account.
  2. Launch the Blender executable you built previously and head over to the preferences (Edit > Preferences > System > Cycles Render Devices) to enable OptiX. Then select one or more of the supported GPUs in your system in the list below that option (only RTX GPUs are supported at this point and they will only show up with the driver from above installed).
  3. Everything is now set up. Next time you start a render in Cycles, and you have a supported GPU, it will be accelerated using OptiX (provided that the scene has the Device type set to GPU Compute in its render settings). Note: The very first start may take a minute or two longer, since OptiX must compile and cache the render kernels once.

In case you run into issues, starting Blender with --debug-cycles will turn on log output from OptiX which can help troubleshooting.

Diff Detail

Repository
rB Blender
Branch
cycles_optix (branched from master)
Build Status
Buildable 4464
Build 4464: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Nice work, thanks!

I'm wondering: does the rendering speed increase with the bucket size, as it was a while ago, or is there a tipping point?

Hello the build is fantastic, i report 1.7x more render speed with both of my RTX 2080ti. I got a lot of Optix cuda crashes and blackscreens, blender crash then i lose my monitors inputs from my gpu for 10seconds.

The Optix crash message.
OptiX CUDA error CUDA_ERROR_LAUNCH_FAILED in cuStreamSynchronize(cuda_stream[thread_index]), line 610
OptiX CUDA error CUDA_ERROR_LAUNCH_FAILED in cuMemcpyDtoH( (char *)mem.host_pointer + offset, (CUdeviceptr)mem.device_pointer + offset, size), line 1334
Error: OptiX CUDA error CUDA_ERROR_LAUNCH_FAILED in cuStreamSynchronize(cuda_stream[thread_index]), line 610

also sometimes too big bucket tiling size make cuda crashes or black screens like explained.

Thanks a lot for your Great work ! Amazing !
Best regards.

Another question, is this just for windows users?

This patch works on Linux too.

I’m wondering:

When CUDA is active, there’s a CPU + GPU option. Is this a CUDA-only option, or could the CPU be added to the OptiX method as well. In that case, I hope that will soon follow for even more rendering speed.

CPU + GPU doesn't currently work with this because OptiX manages its own BVH format, but Cycles only builds a single BVH for everything (with CPU + CUDA this is BVH2 for example). The CPU clearly cannot make use of the OptiX BVH though (since that one is designed for RT Cores and not usable from the outside). So Cycles would have to be extended to allow building and using multiple BVHs simultaneously, which was out of scope for this initial implementation (is something to keep in mind for the future though).

I downloaded the diff file and put it into C:\blender-git\blender but when i do the git apply command i get

error: patch failed: intern/cycles/render/svm.cpp:75
error: intern/cycles/render/svm.cpp: patch does not apply

How do i fix this? Right now the patch just refuses to apply at all

I downloaded the diff file and put it into C:\blender-git\blender but when i do the git apply command i get

error: patch failed: intern/cycles/render/svm.cpp:75
error: intern/cycles/render/svm.cpp: patch does not apply

How do i fix this? Right now the patch just refuses to apply at all

I propose you remove this specific file from patch and do the changes manually ( line by line ). Possibly you have a non-fitting revision

On linux mint, patching installing etc. was a breathe. I just wondered first where to put the optix stuff.
Inside the blender build libs seemed the desired place that did ad hoc.
Building also went through fine.
With my 2 2080 i get typically factor 1.6 speedup atm.

Jens

I downloaded the diff file and put it into C:\blender-git\blender but when i do the git apply command i get

error: patch failed: intern/cycles/render/svm.cpp:75
error: intern/cycles/render/svm.cpp: patch does not apply

How do i fix this? Right now the patch just refuses to apply at all

adding --reject to git apply worked for me. I didn't had any problem yet.
I think the error is caused by this part of the diff

- global_svm_nodes->resize(global_nodes_size + svm_nodes.size());
+  global_svm_nodes->resize(global_nodes_size + (svm_nodes.size() - 1));

The problem is that there's nothing similar to the first line in my file

this is awesome thanks for the effort & continue support! But im having issues, after install new drivers, optix , etc. Blender is unable to recognize my RTX 2080. i did clean install, remove old drivers, and the RTX seems to work with other softwares like maya, Houdini. but not in blender, even on the oficial release, so i reinstall the last driver that use to work with and is´nt working either, no gpu compatible . this is happening to me with diferent branches, so i goy stuck , not blender at all. ¿ Anybody having this issue? any idea what could be happening? some help! thanks!

This is my first time here and my first time trying to install a nightly build. In doing so I forgot to apply the patch. Blender is installed just fine and running but it's not recognizing my 2 rtx 2080tis. It says: "No compatible GPUs found for path tracing Cycles will render on the CPU"
Any way to install the patch file after the fact? cmd.exe is still running saying: C:\blender-git\blender>

Thx,
-Dave

P.S.

I don't see how to apply it anyway, as I don't see in the instructions when to type in the info into cmd.

This is my first time here and my first time trying to install a nightly build. In doing so I forgot to apply the patch. Blender is installed just fine and running but it's not recognizing my 2 rtx 2080tis. It says: "No compatible GPUs found for path tracing Cycles will render on the CPU"
Any way to install the patch file after the fact? cmd.exe is still running saying: C:\blender-git\blender>
Thx,
-Dave

Hey dave, i had the same issue, i had to delete the config files, and run/install it again and setup up my presets and it works for me, before to do that, you probably want to backup your addons folder and pasted after blender made the new folders. in case you dont know in windows the path is: C:\Users\User/AppData\Roaming\Blender Foundation.

Hope it works for you too!

Thx for the fast response,

I'm a little lost, first time trying this. All I have in the config folder in that location is a bookmarks.txt file and a userpref.blend file. Which part am I supposed to "run/install it again"

Dave, did you compile Blender or just install a nightly build? This patch needs to be installed prior to compiling Blender, it's not something you can do to an already-compiled build after downloading it.

I believe I compiled it. I followed the instructions from the top of this page "Set up a Blender build environment"

Ok, you apply the patch after downloading the sources, but before building. So in the same prompt you ran the commands in the "Download Sources and Libraries" section of those instructions, you'd do the git apply right after running git submodule foreach git pull --rebase origin master. So assuming you successfully built a regular nightly (my best guess of where you are from your comments), go back to your cmd prompt and run git apply <path to diff> then recompile.

(to answer the question in the initial comment, no you can't patch after the fact. You'll need to recompile. It should be faster this time though, since you don't need to download anything besides the diff or rebuild files the patch doesn't change)

So after I "git submodule foreach git pull --rebase origin master" and before I "make full" I would "git apply"?
I take it there is a file path involved in that?

Thx for the fast response,
I'm a little lost, first time trying this. All I have in the config folder in that location is a bookmarks.txt file and a userpref.blend file. Which part am I supposed to "run/install it again"

in that location blender automatically creates a folder went it runs or install for firts time. In this files is placed and saved all you configurations, including your third partys addons. so in case you have some of those, backup "scripts" folder (this folder its accesible for all blender's builds you run). So you have to delete all "blender foundation" folder & went you run again blender i will be starting as the fisrt time you used. and those folders will be created again, so this is a WARNING: you have to config again your preferences. and if you had addons pasted again in the actual folder.

before you do that: check this:

-are you using the display driver 435.80? ----it works just with this display driver.
-its optix showing up like option in blender---to know is correct builded.

Yeah display drivers 435.80 but no optix

Yeah display drivers 435.80 but no optix

this is allready build, it works for me, should work for you too. but i had to do, all the stuff i wrote before. im not a hardcoder, so i didnt know what line in cofig was bad. so i delete all, and wait to create new ones . and setup it again.
https://blender.community/5d3f001dd3ac8b4215373abd/download/5d3f0602d3ac8b42458323d1.

Hey it's there, right on thx for the help. Let me mess with this for a while and I'll get back to you.

Thx again,
Dave

Let's keep this code review for code review, if you need help building Blender go to https://devtalk.blender.org/.

Rebase against master and fix merge conflicts

Is there a link to a build of this for Windows or Linux we can download this from?

If there isn't a build out there now, is there anything preventing someone from making a usable build from this patch?

I'm trying to work through all the Cycles and other patches, apologies for this taking a while. I will try to take a close look at this next week since the rest of this week I have time off.

I would like to merge this into master rather quickly, maybe next week or the week after, but not consider it a fully supported feature immediately. Then we can work through some of the bugs and things to improve in the implementation in master, and see how close to stable it is for 2.81, or if it needs to wait for 2.82.

Steps could be as follows:

  • I commit some of the non-Optix specific changes in this patch to master, ensuring there are no performance regressions.
  • Make a task here with remaining issues. Things like CPU + GPU support, fallback to CPU memory like CUDA, etc.
  • Commit rest of the patch, with Optix building disabled by default.
  • Give developers and platform maintainers time to solve build issues and set up builds. Then enable on buildbot.
  • Work on remaining issues and handle bug reports.

Note that I would like this to replace the CUDA backend eventually as the officially supported way to render on NVIDIA GPUs. There's not much point maintaining multiple backends if we can get to feature parity and support all the same cards.

Already with the current implementation it would make sense to share more code between the CUDA and Optix backends. I imagine the code for things like denoising and fallback to CPU memory can be shared.

Sounds good!
Just a note regarding the CUDA backend: I don't think this can fully replace that one, since OptiX only works on Maxwell and up, with full performance only on Turing, so the CUDA backend would still be necessary to support older cards.

@Michael March (cowmix): LazyDodo did a build with this patch at https://blender.community/c/graphicall/Cfbbbc/

This comment has been deleted.

Laid groundwork for baking support

Moved all shader evaluation kernels into an OptiX pipeline.
Baking is still disabled because the relevant code isn't very well suited for inlining (which is necessary in OptiX) and therefore runs slow.

Mai Lavelle (maiself) requested changes to this revision.Thu, Aug 8, 1:50 PM

Thanks for the contribution!

The Optix device implementation seems to be working well which is good. Theres potential with these changes for breaking other the implementations however, so we should take care to test each before merging this fully.

A few things to consider in addition to the inline comments:

  • Can the Cuda and Optix device selections be merged? If Optix is available for a device wouldn't it make sense to select that over Cuda? The distinction could be hidden from the user and the best option used automatically.
  • Using an ifdef around nearly every ccl_device_noinline seems a little silly, maybe there's a better way to express this if its basically the default? In the future functions may need to be marked as noinline, but it would be easy to forget to add the ifdef, so finding something better could be a good idea.
  • Theres a few cases where you've moved a branch into a function (kernel_path_ao for example) for performance, however some of the conditionals from these differ (compare regular path tracing to baking). We should check that nothing is broken by this.
  • Similarly, there was quite a bit of rearranging of some branched path tracing functions (kernel_branched_path_surface_connect_light for example). I'm not sure these are equivalent to the unmodified versions, this should be looked at more closely.
  • This patch seems to have broken the split kernel somehow, this needs investigation.
intern/cycles/blender/blender_session.cpp
197–202

Use parentheses around comparisons. Also other places.

intern/cycles/device/device.h
410

Maybe call this build_bvh instead?

intern/cycles/device/device_optix.cpp
1953

Could an override for this be added to the debug panel? That would make it easy to test / debug the Optix code paths with non RTX cards without needing to patch over these lines.

intern/cycles/kernel/geom/geom_subd_triangle.h
102 ↗(On Diff #16880)

Is there a reason this change isn't an ifdef like all the other cases?

intern/cycles/kernel/kernel_path.h
373–374

Use braces even for single lines. Other places as well.

intern/cycles/kernel/kernels/optix/kernel_optix.cu
27

Maybe use snake_case instead of camelCase for these functions? These aren't part of the Optix API yet look like they are, I think it'd be better if it were more apparent that these are defined in Cycles by keeping consistent with that style.

This revision now requires changes to proceed.Thu, Aug 8, 1:50 PM

Fixes some of the formatting issues that were brought up. The #ifdefs in geom_subd_triangle.h were missing on accident.

I'd prefer using a different solution than all these #ifdefs around noinline functions too. I didn't want to redefine the ccl_device_noinline macro to force inline though, since that would only cause confusion. Maybe we could instead introduce macros like ccl_device_prefer_noinline (which would inline on OptiX) and ccl_device_force_noinline (which would noinline everywhere)?

I checked the changes to kernel_branched_path_surface_connect_light against lots of scenes and couldn't identify any differences, so they should check out. More eyes always help though.

Adding a debug flag to force listing of all devices under OptiX would be nice, but it seems the DebugFlags() mechanism is currently evaluated after devices are listed, so they don't have an effect on it.

As for merging CUDA/OptiX device selection: In the long run this is probably a good idea. For now though the OptiX device comes with some limitations that prevent this (e.g. you cannot select a CPU device with it, since that would require multiple BVH layouts to be built which Cycles cannot do currently).

Patrick Mours (pmoursnv) marked 4 inline comments as done.Thu, Aug 8, 3:05 PM
Alex Fuller (mistaed) requested changes to this revision.Thu, Aug 8, 9:16 PM

Hi,

I had this posted a day or two ago but I didn't understand how phabricator's differential works. I've made a basic inline comment in bvh_embree.h in which the new function needs fixing to compile properly:

virtual void upload(Progress &progress, DeviceScene *dscene) override;

Cheers, and great work!

intern/cycles/bvh/bvh_embree.h
39

Hey! I am compiling GafferCycles https://github.com/boberfly/GafferCycles with Embree, and I had a compile error. This line should be:

virtual void upload(Progress &progress, DeviceScene *dscene) override;

Cheers!

This revision now requires changes to proceed.Thu, Aug 8, 9:16 PM

 - Fixed a compile error when building with Embree (thanks @Alex Fuller (mistaed))

  • Fixed an OptiX warning during pipeline creation
  • Reduced number of used attributes in OptiX pipeline to two
Patrick Mours (pmoursnv) marked an inline comment as done.Mon, Aug 12, 6:15 PM

Fixed performance regression that caused BMW benchmark scene to only be 1.1x faster instead of 1.6x

Just thinking aloud, this might be an opportunity for a small refactor of cycles: currently, Cycles builds the BVH before it uploads geometry vertices to the device. Most ray tracing APIs (DXR, OptiX, Embree) work the other way round. Before we introduce an OptiX specific workaround, should we change Cycles to do vertices first, BVH second? That would also help the Embree backend and potentially a future DXR/Vulkan backend when we see more vendors deliver ray tracing hardware.

Did a simple test to see if branched path tracing is affected by this, and in fact it is, as can be seen in these images. Note that this test was CPU only, so this is a regression and not limited to the new device backend. The branched functions will need to be corrected, I haven't looked too deeply at it yet, but it looks like some lines from the original are missing from the new versions of these functions. I'll do a more thorough look thru and testing later.

master:


diff:

Test scene:

Fixed modified branched connect to light functions

They were missing a check for mesh lights. Seems to produce the correct result in that test scene now when run with CPU.

Also improved performance slightly by fixing some unnecessary spills around trace calls again + some noinline specifiers.