1.5x to 2x faster interiors rendering with optimized AO approx for cycles
ClosedPublic

Authored by mathieu menuet (bliblubli) on Sep 3 2017, 9:02 AM.

Details

Summary

This patch react smartly to materials, making tha AO approx option really usable in interiors. It works on all devices (CPU, CUDA and OpenCL). Speedup vary between platform however. On the platform it was made on (OpenCL), the speedup is the best. From 1.5x faster for Classroom to 2x faster for the scandinavian interior from Chocofur.

Edit: removed things that were not related to code directly.

Diff Detail

Repository
rB Blender
Brecht Van Lommel (brecht) requested changes to this revision.EditedSep 3 2017, 12:28 PM

I'm not sure what the whole funding thing is about, I'll focus on the patch. By submitting this patch you're contributing it under the Apache 2 license, if you wrote this code there's no copyright problem.

So what this does is change AO bounces to still always continue all transmission and 1 glossy rays. That makes more practical as an option for final renders at the cost of slower renders if you want to use this purely for very fast preview renders. But glossy and transmission being much too dark isn't that useful either for preview. To it seems reasonable, there's some compatibility breaking but the result is more accurate so not a big deal for master I think.

Please deduplicate this logic in a utility function in kernel_path_state.h like this:

ccl_device_inline bool path_state_ao_bounce(KernelGlobals *kg, ccl_addr_space PathState *state)
{
    if(state->bounce <= kernel_data.integrator.ao_bounces) {
        return false;
    }

    int bounce = state->bounce - state->transmission_bounce - (state->glossy_bounce > 0);
    return (bounce > kernel_data.integrator.ao_bounces);
}

Perhaps it is sufficient to just allow transmission and glossy bounce if those are the first bounces, and not behind some diffuse or volume bounce (caustics)? Not sure.

ccl_device_inline bool path_state_ao_bounce(KernelGlobals *kg, ccl_addr_space PathState *state)
{
    if(state->bounce <= kernel_data.integrator.ao_bounces) {
        return false;
    }

    return (state->bounce - state->transmission_bounce - (state->glossy_bounce > 0)) > 0;
}
This revision now requires changes to proceed.Sep 3 2017, 12:28 PM

I agree to deduplicate, although doing it as part of https://developer.blender.org/D2644 would maybe make more sens?

Regarding the proposition, I would have to try it, but looking at the code, it could return true on first bounce if it's a transmission or glossy bounce (bounce and transmission_bounce == 1 so substracted it's 0) ? And it ignores user setting of AO approx? My concentration is near 0 too at the moment, so sorry if I'm wrong. I tried to have no parameter to make it more user friendly, but as it does change the output and I couldn't find a value/equation that would work in all cases, I think it's better to let the user decide what value is the best for his/her scene. Most of the time, AO approx at 2 just works, but in rare cases, 1 is enough or 3 will be needed.

I tried many other solutions before keeping this one. Glossy or transmission rays behind a diffuse ray were important in nearly all architectural renderings I did. Not so much for the caustics, but to keep shadows lighter. This trick has a tendency to make contrast higher by giving harder shadows. It's actually a good think in many case to keep the eye to concentrate on the bright parts, but to strong, it will look fake, or remove too much details in shadows. Of course, it's all based on client reactions and is highly subjective, but we do images for humans. And if I remember correctly, the increased speedup of approximating more rays sooner was beneath 5% in all our scenes.

I agree to deduplicate, although doing it as part of https://developer.blender.org/D2644 would maybe make more sens?

No need to wait for that, we can do it right immediately.

Regarding the proposition, I would have to try it, but looking at the code, it could return true on first bounce if it's a transmission or glossy bounce (bounce and transmission_bounce == 1 so substracted it's 0) ? And it ignores user setting of AO approx? My concentration is near 0 too at the moment, so sorry if I'm wrong. I tried to have no parameter to make it more user friendly, but as it does change the output and I couldn't find a value/equation that would work in all cases, I think it's better to let the user decide what value is the best for his/her scene. Most of the time, AO approx at 2 just works, but in rare cases, 1 is enough or 3 will be needed.

I corrected a mistake in the proposed code now, but either way it keeps the user parameter.

I tried many other solutions before keeping this one. Glossy or transmission rays behind a diffuse ray were important in nearly all architectural renderings I did. Not so much for the caustics, but to keep shadows lighter. This trick has a tendency to make contrast higher by giving harder shadows. It's actually a good think in many case to keep the eye to concentrate on the bright parts, but to strong, it will look fake, or remove too much details in shadows. Of course, it's all based on client reactions and is highly subjective, but we do images for humans. And if I remember correctly, the increased speedup of approximating more rays sooner was beneath 5% in all our scenes.

Ok, I guess the test is fine as is, if you want rays to escape through e.g. a glass window. Just need to deduplicate the code.

Updated the diff with required changes. As Nvidia now also support OCL 2.0, I suppose it will make sens to switch to it for 2.8x. So I made 2 functions to workaround the address space problem of OCL 1.2. When we switch to 2.0, removing the second function will be easy and until then, changing the code is bundled in one place. If you prefer another workaround, I'm open to it.

There's no OpenCL 2.0 in sight for macOS unfortunately, so it's not so easy to switch.

I'm not sure why state is copied onto the stack in kernel_scene_intersect, that seems inefficient to me. Unless it gives a measurable performance improvement we can probably just avoid that and have a single function.

With one function, the scandinavian scene from Chocofur renders in 471sec against 426s with the current version. So the current version renders 10.6% more frames in the same time.
Here is the diff to test:


So I think the best solution is to keep the patch as it is, but following already made workarouds naming conventions like lcg_state_init_addrspace by adding _addrspace instead of _ocl. @Brecht Van Lommel (brecht) What do you think?

It looked way too much, so I tested again. Time differences were due to throttling issue. When throttling is removed, both version render at the same speed. I'll update the diff with the new one.

only use one function, performance stay the same.

This revision was automatically updated to reflect the committed changes.

Thanks for the changes, committed now.