Cycles: Added Cryptomatte passes.
Needs ReviewPublic

Authored by Stefan Werner (swerner) on Jul 12 2018, 10:37 PM.
Tags
Tokens
"Love" token, awarded by monio."Love" token, awarded by maxon."100" token, awarded by Scaredyfish."Love" token, awarded by Tanguy."Love" token, awarded by vitorbalbio."Love" token, awarded by SteffenD."Love" token, awarded by Noss."Love" token, awarded by aliasguru.

Details

Summary

This patch introduces Cryptomatte output passes to Cycles.

There are two modes:
"Accurate" mode, which is CPU only, tracks all
IDs encountered in a pixel before it sorts them by coverage and strips
any that exceed the number of selected Cryptomatte layers. Since that
relies on dynamic memory allocations, it is CPU only.
The other mode keeps track of no more IDs than the requested number of
layers. When the number of IDs for a given pixel exceed the number of
layers, it is possible that an ID with a higher coverage gets rejected
in favor of an ID with lower coverage, if the latter was sampled first.
It would be good improvement in the future to allow GPU renders too to
track any number of IDs per pixel, regardless of the number of
Cryptomatte layers.

In order to support a user-selectable number of Cryptomatte layers,
Cycles passses are now not only distinguished by type but also by name.
That way, any number of passes of type PASS_CRYPTOMATTE can be added
to a render. Going forward, this could become useful for other types too,
should we introduce light path expressions or custom AOVs.

Diff Detail

Branch
cycles_cryptomatte
Build Status
Buildable 1923
Build 1923: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes
intern/cycles/kernel/kernel_passes.h
266

This doesn't work reliably for CUDA and OpenCL, threads can be rendering multiple samples in the same pixels simultaneously.

How to implement kernel_write_id_slots with atomic ops I'm not sure of though, seems rather tricky.

intern/cycles/kernel/kernel_path_branched.h
561 ↗(On Diff #11413)

Why is this matte weight needed? It seems to be throughput but with transparency averaged per surface, it's not clear to me why that is better than averaging the throughput when writing the pass, any specific cases where that failed?

intern/cycles/util/util_murmurhash.h
76

Functions should have lower case names, also don't think we need the x86_32 postfix if it's going to be used for all CPU architectures.

I think the implementation can also be moved into a util_murmurhash.cpp, not need to have it in the header.

This revision now requires changes to proceed.Jul 19 2018, 1:21 AM
intern/cycles/kernel/kernel_passes.h
266

That's a good one. I wrote this code initially against an older code base and tested only the CUDA side, where this was never an issue. I would think that there should be a way of doing this using atomic compare and swaps.

Sergey Sharybin (sergey) requested changes to this revision.Jul 20 2018, 12:19 PM
Sergey Sharybin (sergey) added inline comments.
intern/cycles/blender/blender_sync.cpp
522

string_startswith() ?

553–557

Indentation.

626

std::min -> this is what util_algorithm.h is for.

intern/cycles/blender/blender_sync.h
68–71

Indentation.

intern/cycles/device/device_cpu.cpp
57

Include statements are supposed to be sorted alphabetically. See render/buffers.h above.

intern/cycles/kernel/kernel_passes.h
271–308

I'm sure this madness of ifdef is easily avoidable.

intern/cycles/render/buffers.cpp
239

First you go away from string to char* when calling this function, then you go away from string to char* here in the pass? Whats' the point? Just compare string to string.

intern/cycles/util/util_murmurhash.h
127–128

util_algorithm.h

Stefan Werner (swerner) marked 8 inline comments as done.
  • Cycles: Cryptomatte writing now uses atomics on GPUs to avoid race conditions.
  • Cycles: Code style, fixed typos moving murmurhash to its own cpp file.

I think I found a way to write id passes using atomics.

  • Cycles: Compile fixes for OpenCL kernel.
intern/cycles/kernel/kernel_globals.h
73–75

I benchmarked the Victor scene at 50% size, all Cryptomatte options enabled.
Compiled with Visual Studio 2015 running on 2x Xeon E5-2660v2, that is 40 threads.

Accurate mode: 8m21s
Stochastic mode: 8m18s
Cryptomatte off: 8m10s

So STL appears to be a measurable overhead, but it is below 1% of total render time for this scene. A custom linked list from a memory pool still sounds good though, since that could also be a way of getting the accurate mode for GPUs.

I think I found a way to write id passes using atomics.

That seems like it would work, as long as there are enough free ID slots. If there are not enough it's random which object ID get skipped, but I guess that is acceptable for now.

Without resorting to dynamic memory allocations inside the kernel or grossly over allocating memory, there will always be a chance of randomly skipped data.

Without resorting to dynamic memory allocations inside the kernel or grossly over allocating memory, there will always be a chance of randomly skipped data.

I mean random in that the results could be different between renders, so the result is not exactly repeatable. A solution could be to write the minimum sample number next to each ID and prioritize lower samples, but I'm not sure it's worth it in practice.

Stefan Werner (swerner) marked 6 inline comments as done.
  • Merge branch 'master' of git.blender.org:blender into cycles_cryptomatte
  • Merge branch 'master' of git.blender.org:blender into cycles_cryptomatte
  • Cycles: Implemented full sorting for Cryptomatte layers for all possible renderign devices.
  • Cycles: Refactoring to remove duplicate Cyryptomatte id pass code.
  • Cycles: Addressed some of Sergey's comments on D3538
  • Cycles: Stochastic cryptomatte writing now adds coverage to last ID when it runs out of slots.

Addressed some of Sergey's comment.s

  • Cycles: A little refactor of the cryptomatte ID pass writting to reduce #ifdef clutter
  • Cycles: Removed PathState::matte_weight, using throughput instead for Cryptomatte

Some more inline comments.

intern/cycles/kernel/kernel_id_passes.h
34

Since CAS always returns the old value, we could actually detect the case that another thread already allocated this slot for the same ID here and get rid of the "--slot;" part.

49

For consistency, I'd prefer to also test for slot ID == ID_NONE here.

intern/cycles/kernel/kernel_passes.h
270

Could we move 2 * kernel_data.film.cryptomatte_depth into the macro? It's always the same and it would make this part easier to read.

intern/cycles/kernel/kernel_path_branched.h
560 ↗(On Diff #11499)

I think this change can be removed?

intern/cycles/kernel/kernels/cuda/kernel.cu
54

Why is the sorting part of the kernel? I'd prefer to just move it to buffers.cpp, that would simplify the code a lot.

Also, sample is uninitialized if work_index >= total_work_size.

intern/cycles/render/coverage.cpp
32

The code here could be deduplicated by having a wrapper function (or maybe function pointer?) that calls either flatten_buffer or sort_buffer.

Stefan Werner (swerner) marked 12 inline comments as done.
  • Cycles: More Cryptomatte code review changes.
  • Cycles: Skip calls to CPU kernel coverage buffer when accurate cryptomatte layers are turned off.
Stefan Werner (swerner) marked 2 inline comments as done.Aug 17 2018, 10:17 PM
Stefan Werner (swerner) marked 2 inline comments as done.

Looks good to me except for one detail.

intern/cycles/kernel/kernel_id_passes.h
49

This should check .x instead of .y I guess? In that case, the assert needs to be changed to .y == 0.0f.

Stefan Werner (swerner) marked an inline comment as done.
  • Cycles: Fixed typos in Crytpomatte code.
  • Fixed indentation.
Sergey Sharybin (sergey) requested changes to this revision.Aug 21 2018, 5:16 PM
Sergey Sharybin (sergey) added inline comments.
intern/cycles/blender/blender_sync.cpp
626

Narrowing from int. Is really needed to be unsigned short ?

intern/cycles/kernel/kernels/cuda/kernel.cu
51–62

Move to a function, call from the if() statement above.

53

This is quite scary.

intern/cycles/kernel/split/kernel_buffer_update.h
83

Need update? Is being updated? A function pointer to function which updates the ray?

intern/cycles/render/buffers.cpp
385–388

Clearly, only scale y and w. Without comment ;)

intern/cycles/render/coverage.cpp
18–24

The convention is to:

  • Have own include first.
  • System includes.
  • Rest of includes, in alphabetical order.
28

There is pair in ccl namespace, see util_map.h.

69

Either use int for pixel_index or do a cast of tile.w to`size_t` to make math happen in size_t. Currently it's happening in int.

102

Why this index is in int?

intern/cycles/render/coverage.h
36–38

Can we typedef unordered_map<float, float> for the clarity.

Will shorten so much code!

intern/cycles/render/film.h
80

Make a named enum (ideally), and do not use use_ for not-a-boolean.

81

Just int.

intern/cycles/util/util_murmurhash.cpp
24

Can we have this file more cycles-styled?

This revision now requires changes to proceed.Aug 21 2018, 5:16 PM
Stefan Werner (swerner) marked 9 inline comments as done.
  • Cycles: Addressed Sergey's cryptomatte omments.
  • Cycles: Fixed OpenCL build
  • Cycles: Code styling
Stefan Werner (swerner) marked 2 inline comments as done.
  • Cycles: Code styling
  • Merge branch 'cycles_cryptomatte' of github.com:skwerner/blender into cycles_cryptomatte

Some more updates, addressing Sergey's comments.

intern/cycles/kernel/kernels/cuda/kernel.cu
54

Moving it to buffers.cpp excludes it from GPU acceleration.

Putting it in buffers.cpp isn't straight forward. RenderBuffers is agnostic of total sample count, and sorting every time get_pass_rect is called is wasteful. RenderTile knows about total sample count but is just a structure of data without methods. Turning it into its own RenderTile::Task (similar to denoising) seems like overkill to me.