Page MenuHome

Cycles/Tests/MSVC: cycles_denoise tests are failing
Closed, ResolvedPublic

Description

System Information
Operating system: Windows
Graphics card: N/A

Blender Version
Broken: Latest master
Worked: Before C++17 switch

Short description of error

The cycles_denoise and cycles_denoise_animation are failing on SSE4.1 and AVX architectures on windows.

could be a codegen bug, have not investigated yet.

Exact steps for others to reproduce the error

Run tests

Event Timeline

Ray molenkamp (LazyDodo) changed the task status from Needs Triage to Confirmed.Fri, Jun 19, 10:38 PM
Ray molenkamp (LazyDodo) triaged this task as High priority.
Ray molenkamp (LazyDodo) created this task.
Ray molenkamp (LazyDodo) updated the task description. (Show Details)
Ray molenkamp (LazyDodo) edited projects, added Cycles; removed BF Blender.

The first one is hiding in fast_exp2f4 managed to extract a repro, but it's not quite "minimal" yet, so unsure on the exact issue. Not convinced there are not other issues as well, we'd really should have landed D5602 for at least the cpu architectures before switching compilers.

@Ray molenkamp (LazyDodo), are you looking into this or want me to help with something?

Looking into this, but it's a jumpy little bastard, every-time you edit some seemingly unrelated code it may or may not go into hiding, given it only seems to rear it's head inside fast_exp2f4 disabling the optimizer for just that function seems to sidestep the issue (and given its only used in the denoiser and only in a single pass, I don't feel super bad about a possible perf hit).

Given extracting a good repro case will take a bit of time, as will reporting it and getting it fixed (guessing months, the race condition we had in the CRT last august is still open , although an asan bug recently got dealt with swiftly ) i think the way to go right now would be something like

diff --git a/intern/cycles/util/util_math_fast.h b/intern/cycles/util/util_math_fast.h
index dbed83ab84d..c046601f48e 100644
--- a/intern/cycles/util/util_math_fast.h
+++ b/intern/cycles/util/util_math_fast.h
@@ -446,6 +446,10 @@ ccl_device_inline float fast_expf(float x)
 }

 #ifndef __KERNEL_GPU__
+/* MSVC seems to have a codegen bug here in SSE41/AVX */
+#ifdef _MSC_VER
+#  pragma optimize( "", off )
+#endif
 ccl_device float4 fast_exp2f4(float4 x)
 {
   const float4 one = make_float4(1.0f);
@@ -461,6 +465,9 @@ ccl_device float4 fast_exp2f4(float4 x)
   r = madd4(x, r, make_float4(1.0f));
   return __int4_as_float4(__float4_as_int4(r) + (m << 23));
 }
+#ifdef _MSC_VER
+#  pragma optimize( "", on )
+#endif

 ccl_device_inline float4 fast_expf4(float4 x)
 {

A more fine grained approach by singling out the SSE4/AVX kernels is possible,but given how jumpy it is, i'm not sure AVX2 isn't affected, it may just not be affected "right now", and I would lean on the side of caution there and just disable the optimizer for all kernels.

That being said, having these kind of issues (also having codegen issues in last week in GCC's update) is making me somewhat uncomfortable, expanding our testing surface to cover all CPU architectures should be fast-tracked.

Update: Issue reported , i'll land the work around tomorrow.