Skip to content

Shadow Pipeline

All shadow casters subscribe to this pipeline. It is submitted once per view that needs shadows on every frame. This is because update detection is done one the GPU by Shadow Module.

The rendering uses the multiview feature of the draw manager. The views are initialized on GPU and instances are generated only for valid views.

The framebuffer has no attachment and use an empty framebuffer configuration the size of a virtual shadow map (8192 x 8192 px). This is because rendering with a depth buffer of at most 64 x 8192 x 8192 px (~16GB of VRAM) is not feasible. Instead, we rasterize every polygon and use atomic min operations to populate the shadow atlas directly. This is why we need a separate tile Page Clear phase prior to render.

Since the size of the rendered regions can vary a lot (from 1x1 to 32x32 tiles wide) we leverage the multi viewport rendering capabilities of the GPU. With this extension, we can use a different viewport size for each view to avoid the rasterizer to spawn a lot of uneeded fragment shaders. The specs allows for 16 viewports which allows for a lot of variation in sizes. Note that one viewport can be reused by multiple views since we use the atomic operation to manually output to the correct region.

Each instance is generated with an associated drw_view_id (draw manager view), which allows to fetch the associated ShadowRenderView. Using this data we can output the geometry to the correct viewport and get the shadow atlas page indirection data inside the fragment shader.

The fragment shader outputs either the radial distance to the shadow cube center (for punctual shadows) or the distance to the near plane (for directional shadows).

For punctual, we do a software clip (using discard) before a certain distance from the shadow origin to avoid shadow casters inside sphere lights to cast shadow outside the sphere.

The output is written dirrectly at the physical page location inside the Shadow Atlas texture using atomicMin operations. The page are cleared at FLT_MAX. This works because we only write positive float bits and the binary representation of increasing positive float is also increasing.