After the addition of custom and light group AOVs in D3607, kernel_accumulate immediately clamping, and also writes light groups directly to the render buffer. The code here code be refactored further to work like this and simplify and hopefully optimize the kernel.
All light passes (except perhaps combined) could be written directly to the render buffer, rather than stored intermediately in PathRadiance. This would reduce kernel memory usage and data copying on the GPU, at the cost of more atomic writes. For CPU rendering without atomics this is likely a clear win, for the GPU it will have to be tested, but intermediate copies it does now are not cheap either.
As a second step, a state machine could be added that is updated on every light bounce. Each state would have a list of active AOVs to write to. This state machine would be created on the CPU side, and initially would just be for built-in AOVs and light groups, but could support light path expressions in the future too.
Further, we have a number of builtin non-light AOVs. These could be implemented as part of shader execution like custom AOVs, to simplify the kernel side code. This would be like a set of shading nodes that is automatically executed after all shaders. For users it would even be useful if they could specify a shader node group like that in the view layer, to write custom AOVs for all materials without having to add AOV output nodes for each material.
For features like adaptive sampling, we could also use an easy way to loop over all AOVs to rescale them in the kernel, right now it's hardcoded.