First of all, this is nowhere near ready for master, but it is a working proof of concept, i'd like some feedback on.
nvcc is notoriously picky on the version of the host compiler, which is annoying cause most of the cycles code has no dependency on the host headers at all, nor do we use mixed cuda/native code. The host compiler should be mostly irrelevant for cycles. On windows things are worse, because nvidia tends to drag their heels on supporting newer visual studio versions (if took over 7 months to support msvc 2017, once support came out, they supported the initial release, with update 1 support 'in beta', however by the time cuda 9 came out, msvc was on update 5.....) which pretty much makes it impossible for us to use a recent compiler.
This patch adds a cli compiler called cycles_cubin_cc that is pretty much a replacement for nvcc using the nvrtc library.
- I only tested it on windows.
- nvrtc is only available on x64, for x86 builds we'll have to build a 64 bit cycles_cubin_cc , however since I was lazy, the proof of concept depends on cycles_util and oiio's parameter parsing libs, we can't expect users to check out 2 lib folders just to build this thing.
- nvrtc is used though cuew, I think license wise we should be able to ship it (if we chose to do so).
- Currently it only supports buildtime cubin generation, i guess it could support on demand, but i haven't implemented it yet.
- The cuda toolkit still needs to be installed completely for it to work for the following reasons:
- cycles needs cuda.h
- nvrtc lacks a linker (it outputs ptx) so it shells out to ptxas to link the final cubin (we could do this though the cuda driver api, however ptxas will work on hosts without the nvidia driver installed like our buildbots, so it's the better choice)
- There's probably a ton of codestyle violations, sorry!
- Cleanup cycles_util/oiio dependencies so it can be build for an x86 build.
- Preferably take the same arguments as nvcc so it'll be drop in replacement.
- nvidia added a verison number to the nvrtc dll filename, should probably find a nicer way to deal with this inside cuew
- test on linux
- I would like this to become the standard compiler on windows, so we are no longer held hostage by cuda's msvc support (or lack there of)
(on a sm_30 card)
given this is rather polluted devbox, i'd say the perf is +- identical to nvcc