Page MenuHome

GGX distribution produces black pixels
Closed, ResolvedPublic

Description

System Information
Debian AMD64, fglrx-driver

Blender Version
Broken: 2.72 and later
Worked: 2.71

Short description of error
The GGX distribution on the Glossy, Glass and Refraction Nodes produces black pixels in some cases.
The roughness value seems to play a major role.
In this picture it produces black lines:

Exact steps for others to reproduce the error
Here is a .blend file:


I'm changing the roughness value, which depends on a noise texture modified by the formula e^(x*y). (x is the noise texture value, y=-10)
Open in 2.71 and in the current version. Set viewport to "Rendered" and you should see the difference.

Thank you!

Event Timeline

Robert M (robertm) raised the priority of this task from to Needs Triage by Developer.
Robert M (robertm) updated the task description. (Show Details)
Robert M (robertm) set Type to Bug.

Cant reproduce here (2.76 - cfc109e)

Can you confirm that it still happens in 2.76?

Yes this still happens with latest git HEAD.
I use CPU renderer on AMD Richland APU.

I'm using gooseberry 2.75 e96e452 and it works fine on CPU "i7" and CUDA "780, 970" , both supported and experimental mode.

Hmm..., maybe the issue is OS, or compile flag dependent.
I use the 64 bit Linux builds from buildbot.
@Carlo Bergonzini (Carlo) Andreacchio, @mohamed (mohamed) Sakr: what OS are you using?

I can reproduce it on OS X. There's some special code to deal with very small roughness values since they give float precision issues, that may be failing for some values close to but not quite zero.

Brecht Van Lommel (brecht) lowered the priority of this task from Needs Triage by Developer to Confirmed, Medium.Oct 15 2015, 11:38 PM

I experimented with clamping roughness to zero if it is less than 0.000173.
Seems to work well.
Not a real fix but a workaround.

Can't really reproduce the issue. Does it happen with both viewport and final render? Can someone attach system-info.txt from a buggy system?

@Sergey Sharybin (sergey) This happens in both, viewport and final render. Worked fine in 2.71
Here my system-info.txt:

I cannot reproduce this on OS X either. I also tested with AVX (instead of AVX2) kernel, and without any SSE at all. Just opened the blend file and started rendering in the Viewport.

I can't reproduce this anymore on OS X. Maybe it's because I upgraded Xcode (clang 3.6 before, 3.7 now). Buildbot builds never had the issue for me.

Do you guys mind downloading latest buildbot and testing again?

I still have this issue with the buildbot version on Debian. buid-date: 2015-11-27 hash: bafccb0

Sergey Sharybin (sergey) lowered the priority of this task from Confirmed, Medium to Needs Information from User.Jan 19 2016, 6:48 PM

Wasn't able to reproduce the issue on any of the machines which makes it quite really difficult to troubleshoot.

Thing to try:

  • Grab latest build from builder.blender.org
  • Set Debug Value to 256
  • Go to the Debug panel in the Render context and start toggling CPU options there (after changes in that panel render is to be restarted)
  • See if any of the options has any affect on the render result correctness

@Sergey Sharybin (sergey), I tried your suggestion with the Debug panel, tested every combination of SIMD flags, and without any SIMD flags, with and without QBVH, but unfortunately nothing has changed.
I can't test the OpenCL kernels because I have a pre GCN (graphics core next) APU, which is not supported by Blender.
A strange problem really, with no solution in sight. I'll try to test it on some other systems and report back.

Yes, testing on another systems will give some hints indeed. One more idea -- try booting some LiveCD on the same machine and see if you can reproduce the issue there.

I tested build b336124 on the same hardware (AMD Llano APU), but different OS:

-Windows 10: no bug
-Debian Jessie: bug

Maybe it is a glibc issue?
Is the standard C library statically linked in the buildbot builds?
My libc6 version is 2.19

libc is not linked statically, but it's unlikely to be a problem since i did tests on Debian Jessie as well.

Try using some other livnecd linux distro and see how it goes.

No news since one week…

Finally I'm able to build Blender myself.
The cause of this issue seems to be optimization done by gcc.

I tested various optimization levels with gcc 4.9:
-O0 and -Og: OK
-O1, -O2, -O3, -Ofast: black pixels

"-DNDEBUG -O1 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -O2 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -O3 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -Ofast -march=native -s -flto -ffat-lto-objects": black pixels

I needed to use -march=native and -ffat-lto-objects to produce link time optimized code. Without these errors occur during a build.

LTO seems to help a bit.
I can't test with gcc 5 because of the C++ ABI change. The upgrade would break my system.

Were compiler version or compile parameters changed between 2.71 and 2.72 ?

There was no changes in optimization flags for quite some time now. However, changes in sources could make heuristics in optimizer to go one way or another.

glibc-219 builds from https://builder.blender.org/download/ are compiled with GCC-5.3, so you might want to test them.

@Sergey Sharybin (sergey), I just tested the gcc5.5 glibc 2.19 builds. There is no improvement unfortunately.

After upgrading to gcc-5 and gcc-6 I was not able to produce correct renderings anymore (with the -flto trick I used in gcc-4.9)
So I looked at the code in intern/cycles/kernel/closure/bsdf_microfacet.h

in line 453 (for glossy shader) I changed:

if(fmaxf(alpha_x, alpha_y) <= 1e-4f)

to

if(fmaxf(alpha_x, alpha_y) <= 0.000173f)

and in line 542 (for glass and refractive shader) I changed:

if(fmaxf(alpha_x, alpha_y) <= 1e-4f || fabsf(m_eta - 1.0f) < 1e-4f)

to

if(fmaxf(alpha_x, alpha_y) <= 0.000173f || fabsf(m_eta - 1.0f) < 1e-4f)

Smaller values than 0.000173 produce those black pixels.

If someone has an AMD APU + Debian please test this. (I tested on Llano, Trinity and Richland. On all these platforms I have this black pixels issue.)

So I "fixed" it for myself, but does it have a chance to go to master?

I'm a bit skeptical about just applying such changes. alpha^2 should be well within single floating point precision with current epsilon.

If you're building blender yourself, can you step-by-step corresponding functions and see which expression causes numerical instability perhaps?

I made some progress:
After inserting some printfs in the block starting at line 464 I made these observations:

  1. 'D' becomes inf whenever black pixels are produced.
  1. Whenever I wanted to printf 'tanThetaM2' the issue was suddenly gone, probably because of implicit promotion to double 'printf("tanThetaM2=%f\n", (double)tanThetaM2);'
  1. Changing

    float tanThetaM2 = 1/(cosThetaM2) - 1;

to

float tanThetaM2 = (float)((1.0/(double)cosThetaM2) - 1.0);

solves the issue by doing the calculation in double precision mode.

All in all I think there are some unsafe divisions.

Oh, it's alpha^4 in equation for D which is getting quite to a boundary of single precision..

Do you mind trying following thing: replace if(fmaxf(alpha_x, alpha_y) <= 1e-4f) { with if(alpha_x*alpha_y <= 1e-7f) { ?

The new if statement is working for me :)

Do you think it makes sense to replace all (fmaxf(alpha_x, alpha_y) <= 1e-4f) (also for Beckmann?) with the new one?

Think that would be reasonable.

@Brecht Van Lommel (brecht), do you see any downsides of such change? To me it seems to be more robust, but i might be missing something.

Seems fine to me.