Crash during path tracing, particle system (master)
Closed, ResolvedPublic

Description

macOS 10.12.6, MacBook Air, NVIDIA GeForce 320M 256 MB

worked: 2.78c, 2.79rc2
broken: master, daily build 2017-09-02, 32e36a17824

Rendering the image from the viewport or the command line, or setting the viewport shading to rendered, causes a crash during path tracing.



Bastien Montagne (mont29) triaged this task as Incomplete priority.Sep 4 2017, 11:02 AM

Please follow our submission template and guidelines, also read these tips about bug reports, and make a complete, valid bug report, with required info, precise description of the issue, precise steps to reproduce it, small and simple .blend and/or other files to do so if needed, etc.

Can’t confirm any crash here on linux, what kernel are you using for the render? CPU? CUDA? (backtrace seems to point to CPU, but…)

This is indeed a CPU render. The issue is consistent and repeatable, and has affected the master daily builds for at least a week. I have checked this issue with today's daily build (2017-09-03, 718af8e8b35) and the issue remains.

Instructions to repeat issue:

(1) Render the supplied .blend file via the command line e.g. blender -d -b test.blend -f 1

or

(2) Please open the supplied .blend file in the Blender GUI; to trigger the issue you may:

(2a) Press F12 or click Render -> Render Image
or
(2b) Set Viewport Shading to Rendered

I couldn't repro this on either macOS or Linux, testing with the same 718af8e8b35 build from builder.blender.org.

Can you attach the output of Help > System Info? Rendering from the command line with the --debug-cycles option might also give insight.

From the backtrace it's not clear what the issue could be, perhaps some threading issue but then I would expect the crash to be more random. It could be related to the specific CPU model since this is the place we call different kernels depending on what's supported, but I didn't find issues running the various kernels manually with the debug options. Or maybe the backtrace is a bit deceptive and the actual issue is elsewhere.

I've attached the debug info and backtrace when using --debug-cycles

I actually can't provide you with the system info from the daily build because that option is missing from the menu (it's there on 2.78c and 2.79rc2) - Is it possible to trigger the system info dump from the Python console? I've attached the system info from 2.79rc2 in case that provides some helpful background information.

To illustrate, the upper screen capture is from 2.79rc2, the lower capture is from the daily build. I downloaded a fresh copy of daily 718af8e8b35 to double check the help menu: it behaved identically.

This comment was removed by Alan Taylor (skororu).
Brecht Van Lommel (brecht) raised the priority of this task from Incomplete to Confirmed.Sep 6 2017, 11:46 PM

Thanks for the detailed info, I managed to repro the crash now.

The AVX2 being set to True in the --debug-cycles output is not a problem but it is confusing, what it means is that the AVX kernel has not been disabled by some debug options. It still checks the CPU capabilities after that.

System info being missing from the menu is fixed in master already, will be solved in the next macOS build.

This seems to be caused by {dfae3de6bdf5d3e63d34281c840b9e568d0da613}. I guess it's the combination of a SSE4.1 kernel and hair (unaligned nodes) that is failing.

I'm not sure which exact case that commit solved, was it CPU / OpenCL / .. ? Either way we have to be careful backporting this to 2.79.

So for NaNs in node intersection to work here we want them to preserved through max4() / min4(), so that tnear <= tfar fails and we skip the NaN nodes.

When running this code we get different results in the SSE4.1 kernel (nan nan) and AVX2 kernel (0.0 0.0).

printf("%f\n", ssef(_mm_max_ps(_mm_set_ps1(0.0f), _mm_set_ps1(0.0f/0.0f)))[0]);
printf("%f\n", ssef(_mm_max_ps(_mm_set_ps1(0.0f/0.0f), _mm_set_ps1(0.0f)))[0]);

It seems -ffast-math is to blame for that.

@Brecht Van Lommel (brecht), the initial fix was needed to fix avx2 cpu where robust intersection was giving false positive results on empty children (multiplying FLT_MAX by difl would make it in).

Ideas:
-Get rid of fast math, it seems to be only causing issues nowadays than really helping.

  • Use finite fast math (don't remember if we do it on host or for kernel as well)
  • Try to fix min/max, but how?
  • Revert the change and use FLT_MAX/100 so we don't cause inf.

@Sergey Sharybin (sergey), I tried some code tweaks but could not get it to work reliably on all architectures.

I think disabling -ffast-math is the way to go, see D2828: Cycles: disable fast math flags, only use a subset..

@Brecht Van Lommel (brecht), i've committed fix to 2.79 release branch. It's not fully ideal, but is closer to what 2.78 was doing and should be fixing both initial Mai bug an this one. However, i couldn't reproduce any of the bugs on my machine here, so please give it a test and see if we're good for 2.79.

Brecht Van Lommel (brecht) closed this task as Resolved.EditedSep 8 2017, 3:11 PM
Brecht Van Lommel (brecht) claimed this task.

I can confirm it solves the crash on macOS, also koro and victor seems to render ok still.

@Brecht Van Lommel (brecht), the change is only done in release branch, master is still broken... For master i'd prefer to get rid of --fast-math. But might also apply same fix there for the time being. Any strong preference from your side?

I'll commit the fast math changes in a minute, so no need to apply the same fix in master I think.