Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders

This patch fixes a correctness issue discovered in the `int4 select(...)` function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointcloud primitive. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13877
Referenced by issue #92212, Cycles Metal device
2022-01-20 10:11:58 +00:00 · 2022-01-20 10:11:58 +00:00 · f6c8a78ac6 · 2023-02-14 09:21:21 +01:00
parent 9315215b20
commit f6c8a78ac6
2 changed files with 1 additions and 7 deletions
--- a/intern/cycles/device/metal/device_impl.mm
+++ b/intern/cycles/device/metal/device_impl.mm
@ -87,17 +87,14 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
    default:
      break;
    case METAL_GPU_INTEL: {
-      use_metalrt = false;
      max_threads_per_threadgroup = 64;
      break;
    }
    case METAL_GPU_AMD: {
-      use_metalrt = false;
      max_threads_per_threadgroup = 128;
      break;
    }
    case METAL_GPU_APPLE: {
-      use_metalrt = true;
      max_threads_per_threadgroup = 512;
      break;
    }
--- a/intern/cycles/util/math_int4.h
+++ b/intern/cycles/util/math_int4.h
@ -131,10 +131,7 @@ ccl_device_inline int4 clamp(const int4 &a, const int4 &mn, const int4 &mx)
 ccl_device_inline int4 select(const int4 &mask, const int4 &a, const int4 &b)
 {
 #  ifdef __KERNEL_SSE__
-  const __m128 m = _mm_cvtepi32_ps(mask);
-  /* TODO(sergey): avoid cvt. */
-  return int4(_mm_castps_si128(
-      _mm_or_ps(_mm_and_ps(m, _mm_castsi128_ps(a)), _mm_andnot_ps(m, _mm_castsi128_ps(b)))));
+  return int4(_mm_or_si128(_mm_and_si128(mask, a), _mm_andnot_si128(mask, b)));
 #  else
  return make_int4(
      (mask.x) ? a.x : b.x, (mask.y) ? a.y : b.y, (mask.z) ? a.z : b.z, (mask.w) ? a.w : b.w);