Page MenuHome

Cycles Metal device
Confirmed, NormalPublicTO DO

Assigned To
None
Authored By
Michael Jones (michael_jones)
Oct 14 2021, 2:49 PM
Tokens
"Love" token, awarded by Foghead."Like" token, awarded by blueprintrandom."Love" token, awarded by blobinabottle."Love" token, awarded by alexshepard."Love" token, awarded by ErikT."Love" token, awarded by c60."Love" token, awarded by ZeeshanYousuf."100" token, awarded by makizar."Like" token, awarded by cagram."Love" token, awarded by kunemann."Love" token, awarded by tobark."The World Burns" token, awarded by sid350."Love" token, awarded by csabikaa97."Love" token, awarded by maxvoltar."Yellow Medal" token, awarded by Volconon."Love" token, awarded by regcs."Dat Boi" token, awarded by simonarcher."Love" token, awarded by kironde."Burninate" token, awarded by Tiwaz."Burninate" token, awarded by danm3d."Mountain of Wealth" token, awarded by _jaeyp."Like" token, awarded by Hello9999901."Love" token, awarded by Pixla."Love" token, awarded by valwal."Love" token, awarded by vitos1k."Love" token, awarded by bobylito."Love" token, awarded by chr.schmitz."Burninate" token, awarded by philippremy."Love" token, awarded by Gelert."Love" token, awarded by mrsimon."Love" token, awarded by pentagramwookie."Love" token, awarded by kozbilek."Love" token, awarded by TheFynn."Love" token, awarded by kynu."Love" token, awarded by lzandman."Like" token, awarded by MXBBB."Love" token, awarded by misiek3d."Love" token, awarded by Alej."Love" token, awarded by forrestwalter."Love" token, awarded by johjakob."Love" token, awarded by shinyuu."Love" token, awarded by JakobUnn."Love" token, awarded by Zelig_Lim."Party Time" token, awarded by virokannas."Love" token, awarded by KihaAhn."Love" token, awarded by Rajeshkm2."Love" token, awarded by zxleeethan."Love" token, awarded by gwhobbs."Love" token, awarded by Mylo."Love" token, awarded by mycoconut."Love" token, awarded by winnertakesteve."Love" token, awarded by Puckohead."Love" token, awarded by simonmurphy."Love" token, awarded by KABBOUCHI."Love" token, awarded by rvanwees."Love" token, awarded by Hector123."Like" token, awarded by aditiapratama."Love" token, awarded by kursadk."Like" token, awarded by matteolegna."Love" token, awarded by hammers."Like" token, awarded by zeirus."Love" token, awarded by JDGwf."Love" token, awarded by Omega151."Party Time" token, awarded by CAEL."Love" token, awarded by Raimund58."Love" token, awarded by riouxr."Love" token, awarded by kode54."Burninate" token, awarded by runswithfork."Burninate" token, awarded by osakawayne."Love" token, awarded by liquidplace."Love" token, awarded by levybergman."Love" token, awarded by MartinAckerl."Love" token, awarded by ayberko."Love" token, awarded by jameshcrowther."Love" token, awarded by RoyTheKoi."Love" token, awarded by Alaska."Like" token, awarded by GeorgiaPacific."Love" token, awarded by pedropm."Love" token, awarded by samuelmiller."Love" token, awarded by davidmcsween."Love" token, awarded by fsiddi."Love" token, awarded by Yuro."Like" token, awarded by jodyroth."Burninate" token, awarded by Maged_afra."Love" token, awarded by DaveDeer."Love" token, awarded by billreynish."Love" token, awarded by Myreauks."Love" token, awarded by melmass."Love" token, awarded by SteveWong."Like" token, awarded by IPv6."Love" token, awarded by baoyu."Love" token, awarded by gilberto_rodrigues."Love" token, awarded by wouterstomp."100" token, awarded by MetinSeven."Love" token, awarded by davidmikucki."Love" token, awarded by cmzw."Love" token, awarded by jollyjumper."Love" token, awarded by gupon."Love" token, awarded by pablovazquez."Love" token, awarded by Memento."Love" token, awarded by bcholmes."Love" token, awarded by thecooper8."Love" token, awarded by FrancescoDT."Love" token, awarded by smart61."Love" token, awarded by FRworld."Love" token, awarded by Hobbes."Love" token, awarded by juang3d."Love" token, awarded by MAXMADZZ."Burninate" token, awarded by paulgolter."Love" token, awarded by lopoIsaac."Love" token, awarded by zlsa."Love" token, awarded by adamaf."Like" token, awarded by Nurb2Kea.

Description

Tasks

For tracking Metal device development, targeting 3.1 release:

  • Kernel changes required to compile for MSL (Metal Shading Language):
    • Explicit address space qualifiers
    • Adapt cross-platform layer (device/gpu) for MSL
    • MetalRT kernel changes
  • Host changes:
    • Support specification of explicit dispatch argument types
    • Add core Metal implmentation (device_impl, device_queue, etc)
    • Min OS / device version checks
  • Buildbot changes
    • Upgrade Xcode on buildbot machines to macOS 12.0 SDK
      • Arm
      • Intel
  • Regression tests (CYCLES_TEST_DEVICES=CPU;METAL)
    • Enable on Arm buildbot, with just a single test to verify the basic kernel works
    • Fix broken tests
      • Bake tests have empty output
      • Missing shadows(?) in many volume tests
      • Missing point cloud support (D13632)
      • Ambient occlusion only local fails
    • Enable running of all tests on the buildbot
  • MetalRT regression test failures
    • Point clouds
    • Ambient occlusion only local
    • Misc other failures
      • hair transmission
      • openvdb smoke
      • shadow catcher pt transparent lamp only 1.0
      • visibility particles
  • Hardware support
    • Apple Silicon
    • AMD
    • Intel (requires buildbot update to macOS 13, see D16253)

Performance - see T101931


Metal GPU Cycles Implementation

Overview of initial steps taken to get the implementation working.

There are two broad phases to enabling the Metal GPU implementation of Cycles:

  • Adapting the kernel code so that it can be compiled under MSL (Metal Shading Language)
  • Adding the new Metal host-side device

The kernel code changes represent the largest chunk of work, in terms of interaction with existing implementations. MSL is closely based on C++14, therefore much of the kernel code already compiles by default. Here we describe the remaining steps required to fully compile kernels with MSL:

Explicit address space qualifiers

  • Metal requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.

Cross-platform layer (device/gpu)

  • Cycles is architected such that the majority of the kernel code is shared, with a small surface area of platform-specific entrypoints and an associated compatibility adapter header ("compat.h"). Metal is no exception, and almost all required features can be adapted through a "metal/compat.h", including thread indexing accessors, texture sampling adapters, math intrinsics, and SIMD-group functions (analogous to CUDA's warp-level primitives).

    There are a couple of small exceptions which require minimal modification elsewhere:
    • float3 as used by some integrator state arrays, requires the use of Metal's packed_float3 to achieve tight packing
    • Lambda expressions are not available in MSL, however all instances (chiefly volume-stepping) can be easily adapted to use function objects, either directly or through adapter macros

Resource encoding and access

  • The Cycles kernels make use of several device-accessible resources - more than is practical to pass directly through kernel dispatches. Metal offers a convenient way to encode the required resources into argument buffers which we can structure as required for device access. Since MSL is based on C++, we can wrap all of the shared kernel code in a context class containing the correctly structured resources. This way, kernel resources can be accessed using the usual methods: INTEGRATOR_STATE, kernel_tex_fetch, etc...

Beyond this, the bulk of the remaining work is on the host-side, where the addition of a Metal device will have very minimal interaction with existing host-side code.


Links:

Revisions and Commits

rC Cycles
D13632
D13877
D13503
D13423
D13353
D13357
D13241
D13263
D13243
D13234
D13109
D12864
rB Blender
Abandoned
D16042
D16042
D16253
D16253
D13632
D13877
D13503
D13423
D13353
D13357
D13241
D13263
D13243
D13234
D13109
D12864

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

(@Brecht Van Lommel (brecht))

Metal status update:

  1. I have a workaround for these bake failures which involve special-casing RenderBuffers to use managed storage instead of shared storage (possibly a synchronisation issue):
    • Bake tests have empty output
  1. These are the result of a driver issue which should be fixed in macOS 12.2:
    • Missing shadows(?) in many volume tests
  1. ...and as a result we should update the OS version check, however I was planning to do that closer to 3.1 release if this makes sense?
    • Min OS / device version checks

Regarding multi-device support, I'm aware that CPU+GPU performance on Metal is not generally giving an uplift over GPU-only render times. @Brecht Van Lommel (brecht) , are you aware of any general issues with CPU+GPU load-balancing that could be affecting results here?

Artur (ArtWeb) added a comment.EditedJan 26 2022, 4:39 PM

From what I’ve tested, CPU+GPU doesn’t help in CUDA and OPTIX either. That wasn’t the case before Cycles X was introduced so it probably has something to do with recent load-balancing changes.

  1. I have a workaround for these bake failures which involve special-casing RenderBuffers to use managed storage instead of shared storage (possibly a synchronisation issue):

It's a bit odd since the init_from_bake kernel doesn't do anything very different from init_from_camera, and all the other kernels are shared. But a workaround is fine.

  1. These are the result of a driver issue which should be fixed in macOS 12.2:

Sounds good.

  1. ...and as a result we should update the OS version check, however I was planning to do that closer to 3.1 release if this makes sense?

Fine with me, not much point checking for macOS 12.2 if it's not out yet.

are you aware of any general issues with CPU+GPU load-balancing that could be affecting results here?

Yes, we have T89833: Cycles: multi-device rendering performance.

I think generally it does tend to help for more complex scenes that take a while to render, while for simpler/quick renders the balancing overhead/slowness is too much. There's also overhead of having to build 2 BVHs for example. Not really specific to Metal.

It helps for some people in the feedback thread at least:
https://devtalk.blender.org/t/cycles-apple-metal-device-feedback/21868/68

I hope it is appropriate to comment on performance regressions in this thread. I have noted with the past few builds that the render times for the BMW GPU "benchmark" file are consistently slower on an MBP M1 Max, 32 GB, on a power adapter with Energy Mode set to "High Power". I was getting an average of 42 secs in early January and I'm currently getting ~47 secs. I know that some things changed over the last couple of weeks, first with the crashing and then with the CPU not being used when the CPU+GPU option was selected, as mentioned on this thread: https://devtalk.blender.org/t/cycles-apple-metal-device-feedback/21868/96.

I'm aware that this might not be an issue (and might even be inverted) with more complex scenes, but it is consistent and it's not negligible,
so I thought I'd mention it.

Fine with me, not much point checking for macOS 12.2 if it's not out yet.

This actually just came out moments ago.

@Isaac Arias (ike) BMW is actually not a great test. After the BVH2 patch it is the only demo scene that is slower for me.
Monster under the bed, Classroom for example are faster.

What I however noticed it that memory use has gone through the roof, especially Sinosauropteryx Prima, before I could easily render it at 2048 tiles but that was a no go after the patch. Also memory use while rendering was about twice as high as before BVH2.

@Brecht Van Lommel (brecht) do you think that is normal?

@Isaac Arias (ike) BMW is actually not a great test. After the BVH2 patch it is the only demo scene that is slower for me.
Monster under the bed, Classroom for example are faster.

This matches my observations: after the bvh2 switch, all of the benchmark scenes are faster except BMW, which seems to be a bit of an outlier in general.

What I however noticed it that memory use has gone through the roof, especially Sinosauropteryx Prima, before I could easily render it at 2048 tiles but that was a no go after the patch. Also memory use while rendering was about twice as high as before BVH2.

I am investigating this and hope to have insight soon. While it's great that benchmarking scenes render faster with bvh2, we don't want to regress on larger scenes like Sinosauropteryx. One possibility could be to expose a separate MetalRT device in the same way that Cycles has separate CUDA and Optix device. There are still some (seemingly minor) correctness issues with MetalRT, so it would probably make sense for this to be opt-in and marked as experimental. @Brecht Van Lommel (brecht) , does this seem reasonable?

One possibility could be to expose a separate MetalRT device in the same way that Cycles has separate CUDA and Optix device. There are still some (seemingly minor) correctness issues with MetalRT, so it would probably make sense for this to be opt-in and marked as experimental. @Brecht Van Lommel (brecht) , does this seem reasonable?

We can add an option for MetalRT, but I don't think it needs to be a new device type. A boolean like peer_memory for CUDA/OptiX seems enough to me.

Benjamin (Benup) added a comment.EditedFeb 27 2022, 12:19 PM

Hello,

Blender 3.1.0 beta (Apple Silicon) or 3.2.0 alpha (Apple Silicon) can't be used with UV Paint. It is very slow, lag, with the graphic tablet. But the Intel 3.0.1 version works perfectly in Mac Apple Silicon. Is a correction possible ?

Thanks for your work !

@Benjamin (Benup) please report potential bugs through this form. This task is not related to texture painting.
https://developer.blender.org/maniphest/task/edit/form/1/

@ Brecht Van Lommel (brecht) Ok I post the bug in the day ;-)

Did Metal get slower in the 3.1.0 release candidate compared to beta/alpha?

I've been using 3.1.0 beta a lot and have been impressed by render speeds on the M1 Max, however the release candidate seems slower and the GPU settings look different.

I don't have data on this yet, but curious if anyone else has expressed this.

Something wrong with Mac OS 12.3.
An error with ATI drivers for 5-/6-series of AMD cards. AMD driver is 4.1 ATI-4.8.13 and it have half of performance of previous version. As I understand, Michael Jones say that they worked on AMD drivers for CyclesX but 12.3 drop down performance of EEVEE engine and overall system performance. Please, watch for that side.

This is unrelated to Blender, the bug affects the whole system. For more detailed information, see this article from AppleInsider: macOS 12.3 update causing problems for some PCI-E GPU owners.

Please keep this task on topic! This task is for developers only and should focus on code design and review. Please report bugs in the dedicated place. Thank you for your understanding.