Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders

This patch fixes a correctness issue discovered in the `int4 select(...)` function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointcloud primitive.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13877
This commit is contained in:
Michael Jones (Apple) 2022-01-20 10:11:58 +00:00
parent 9315215b20
commit f6c8a78ac6
Notes: blender-bot 2023-02-14 09:21:21 +01:00
Referenced by issue #92212, Cycles Metal device
2 changed files with 1 additions and 7 deletions

View File

@ -87,17 +87,14 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
default:
break;
case METAL_GPU_INTEL: {
use_metalrt = false;
max_threads_per_threadgroup = 64;
break;
}
case METAL_GPU_AMD: {
use_metalrt = false;
max_threads_per_threadgroup = 128;
break;
}
case METAL_GPU_APPLE: {
use_metalrt = true;
max_threads_per_threadgroup = 512;
break;
}

View File

@ -131,10 +131,7 @@ ccl_device_inline int4 clamp(const int4 &a, const int4 &mn, const int4 &mx)
ccl_device_inline int4 select(const int4 &mask, const int4 &a, const int4 &b)
{
# ifdef __KERNEL_SSE__
const __m128 m = _mm_cvtepi32_ps(mask);
/* TODO(sergey): avoid cvt. */
return int4(_mm_castps_si128(
_mm_or_ps(_mm_and_ps(m, _mm_castsi128_ps(a)), _mm_andnot_ps(m, _mm_castsi128_ps(b)))));
return int4(_mm_or_si128(_mm_and_si128(mask, a), _mm_andnot_si128(mask, b)));
# else
return make_int4(
(mask.x) ? a.x : b.x, (mask.y) ? a.y : b.y, (mask.z) ? a.z : b.z, (mask.w) ? a.w : b.w);