Tasks
For tracking Metal device development, targeting 3.1 release:
- Kernel changes required to compile for MSL (Metal Shading Language):
- Explicit address space qualifiers
- Adapt cross-platform layer (device/gpu) for MSL
- MetalRT kernel changes
- Host changes:
- Support specification of explicit dispatch argument types
- Add core Metal implmentation (device_impl, device_queue, etc)
- Min OS / device version checks
- Buildbot changes
- Upgrade Xcode on buildbot machines to macOS 12.0 SDK
- Arm
- Intel
- Upgrade Xcode on buildbot machines to macOS 12.0 SDK
- Regression tests (CYCLES_TEST_DEVICES=CPU;METAL)
- Enable on Arm buildbot, with just a single test to verify the basic kernel works
- Fix broken tests
- Bake tests have empty output
- Missing shadows(?) in many volume tests
- Missing point cloud support (D13632)
- Ambient occlusion only local fails
- Enable running of all tests on the buildbot
- MetalRT regression test failures
- Point clouds
- Ambient occlusion only local
- Misc other failures
- hair transmission
- openvdb smoke
- shadow catcher pt transparent lamp only 1.0
- visibility particles
- Hardware support
- Apple Silicon
- AMD
- Intel (requires buildbot update to macOS 13, see D16253)
Performance - see T101931
Metal GPU Cycles Implementation
Overview of initial steps taken to get the implementation working.
There are two broad phases to enabling the Metal GPU implementation of Cycles:
- Adapting the kernel code so that it can be compiled under MSL (Metal Shading Language)
- Adding the new Metal host-side device
The kernel code changes represent the largest chunk of work, in terms of interaction with existing implementations. MSL is closely based on C++14, therefore much of the kernel code already compiles by default. Here we describe the remaining steps required to fully compile kernels with MSL:
Explicit address space qualifiers
- Metal requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.
Cross-platform layer (device/gpu)
- Cycles is architected such that the majority of the kernel code is shared, with a small surface area of platform-specific entrypoints and an associated compatibility adapter header ("compat.h"). Metal is no exception, and almost all required features can be adapted through a "metal/compat.h", including thread indexing accessors, texture sampling adapters, math intrinsics, and SIMD-group functions (analogous to CUDA's warp-level primitives).
There are a couple of small exceptions which require minimal modification elsewhere:- float3 as used by some integrator state arrays, requires the use of Metal's packed_float3 to achieve tight packing
- Lambda expressions are not available in MSL, however all instances (chiefly volume-stepping) can be easily adapted to use function objects, either directly or through adapter macros
Resource encoding and access
- The Cycles kernels make use of several device-accessible resources - more than is practical to pass directly through kernel dispatches. Metal offers a convenient way to encode the required resources into argument buffers which we can structure as required for device access. Since MSL is based on C++, we can wrap all of the shared kernel code in a context class containing the correctly structured resources. This way, kernel resources can be accessed using the usual methods: INTEGRATOR_STATE, kernel_tex_fetch, etc...
Beyond this, the bulk of the remaining work is on the host-side, where the addition of a Metal device will have very minimal interaction with existing host-side code.