This task intends to gather ideas to optimize the Cycles split kernel.
There are a few major things to work on:
- Reorganize split kernels so that shader evaluation and ray-tracing are not duplicated in multiple kernels, or not duplicated as many times. This could reduce kernel compile times, register pressure, and improve coherence. Alternatively, shader evaluation for background, shadows, volumes, could be specialized so that it does not include nodes not used in any such shader (for example volumes don't need BSDFs).
- This would likely involve some rethinking of how we handle transparent shadows, and maybe light and background shaders.
- Replace usage of queues and atomics for scheduling work, and replace with sorting between split kernels as done by some other renderers.
- Reduce the size of the state that needs to remain in memory between split kernels. The number of rays that can be active is limited by this. It can result in low occupancy or not enough memory leaft for scene data.
- Render passes could be written directly to memory to make PathRadiance much smaller (T72293)
- ShaderData: there are ways to reduce the size of shader globals. Computing some members on demand rather than storing them, compression (lossless or lossy), simpler differentials.
- It may be possible to structure the kernels so that only one closure needs to be stored at a time or for a shorter time, however this may come with some noise trade-offs.
- If shader evaluation is isolated to one or fewer kernels, ray sorting by material ID can improve coherence. Similar sorting may help other split kernels.
I would consider removing branched path tracing for GPU rendering entirely (T52725), since this is not particularly suitable for GPUs and complicates the code. It would be easier to refactor without this.