Cycles Denoising: Part 3: Denoiser Kernels
ClosedPublic

Authored by Lukas Stockner (lukasstockner97) on Mar 31 2017, 2:25 AM.

Diff Detail

Brecht Van Lommel (brecht) requested changes to this revision.Apr 9 2017, 6:37 PM

Why aren't the filters kernels in kernel/? All this duplicated code in CMakeLists.txt, filter_compat_*.h, kernel_config.h is going to make it harder to maintain things.

This revision now requires changes to proceed.Apr 9 2017, 6:37 PM

Moved the kernels back into kernel/.
The intuition back then was to have them as a separate complation unit to allow for faster development iteration, but that's of course also possible when they're inside kernel/, so it was kind of pointless.

Brecht Van Lommel (brecht) requested changes to this revision.Apr 22 2017, 2:48 AM

Much better, thanks!

I'm not going to have time to review these filter kernels in detail before next bcon date, but I guess that's fine and we can go ahead committing this anyway. The way they are integrated with the rest of the code seems ok to me, so I'll only leave some minor comments here.

I think there's potential here for deduplicating code, switching to float4 once that is unified with ssef. I feel like we can also better abstract away differences between CUDA/OpenCL/CPU still, not really specific to the denoising code though. But that's for another time I think.

intern/cycles/kernel/filter/filter_defines.h
21 ↗(On Diff #8686)

This doesn't seem to be used.

intern/cycles/kernel/filter/filter_nlm_cpu.h
17 ↗(On Diff #8686)

This seems to be unused.

intern/cycles/kernel/kernel_compat_cpu.h
45 ↗(On Diff #8686)

Can we rename this to ccl_restrict_ptr? I think this has a specific meaning and "readonly" doesn't quite cover it.

I'm curious if you measured any performance improvements from using restrict? Every time I try it, it doesn't seem to help, but maybe because we already try to write code to avoid reading from the same memory location multiple times.

This revision now requires changes to proceed.Apr 22 2017, 2:48 AM

Addressed review.

I don't remember the exact figures, but at least back when I added ccl_readonly_ptr, it improved performance on CUDA significantly, at least by 20%.
The denoiser kernels are much more memory-bound compared to the path tracing kernels (to the point where moving the feature vector to shared memory doubles performance), so it makes sense that adding const restrict (which allows the CUDA compiler to use the L1 cache iirc) helps more for them.

Only quite superficially reviewed, but looks ok to commit.

This revision is now accepted and ready to land.Apr 23 2017, 4:00 PM

Just a rebase, no real changes.