Cycles: Tweak inlining policy on Metal

This patch optimises the Metal inlining policy. It gives a small speedup (2-3% on M1 Max) with no notable compilation slowdown vs what is already in master. Previously noted compilation slowdowns (as reported in T100102) were caused by forcing inlining for `ccl_device`, but we get better rendering perf by relying on compiler heuristics in these cases.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16081
This commit is contained in:
Michael Jones 2022-09-27 17:01:17 +01:00 committed by Michael Jones
parent fc604a0be3
commit 2b88ee50fb
Notes: blender-bot 2023-02-14 06:45:14 +01:00
Referenced by issue #100749, Blender LTS: Maintenance Task 3.3
1 changed files with 6 additions and 21 deletions

View File

@ -29,28 +29,13 @@ using namespace metal::raytracing;
/* Qualifiers */
/* Inline everything for Apple GPUs. This gives ~1.1x speedup and 10% spill
* reduction for integator_shade_surface. However it comes at the cost of
* longer compile times (~4.5 minutes on M1 Max) and is disabled for that
* reason, until there is a user option to manually enable it. */
#if 0 // defined(__KERNEL_METAL_APPLE__)
# define ccl_device __attribute__((always_inline))
# define ccl_device_inline __attribute__((always_inline))
# define ccl_device_forceinline __attribute__((always_inline))
# define ccl_device_noinline __attribute__((always_inline))
#define ccl_device
#define ccl_device_inline ccl_device __attribute__((always_inline))
#define ccl_device_forceinline ccl_device __attribute__((always_inline))
#if defined(__KERNEL_METAL_APPLE__)
# define ccl_device_noinline ccl_device
#else
# define ccl_device
# define ccl_device_inline ccl_device
# define ccl_device_forceinline ccl_device
# if defined(__KERNEL_METAL_APPLE__)
# define ccl_device_noinline ccl_device
# else
# define ccl_device_noinline ccl_device __attribute__((noinline))
# endif
# define ccl_device_noinline ccl_device __attribute__((noinline))
#endif
#define ccl_device_noinline_cpu ccl_device