GGX distribution produces black pixels #46492

Closed
opened 2015-10-15 04:54:25 +02:00 by Robert M · 42 comments

System Information
Debian AMD64, fglrx-driver

Blender Version
Broken: 2.72 and later
Worked: 2.71

Short description of error
The GGX distribution on the Glossy, Glass and Refraction Nodes produces black pixels in some cases.
The roughness value seems to play a major role.
In this picture it produces black lines:
GGX_bug.png

Exact steps for others to reproduce the error
Here is a .blend file:
GGX_bug.blend
I'm changing the roughness value, which depends on a noise texture modified by the formula e^(x*y). (x is the noise texture value, y=-10)
Open in 2.71 and in the current version. Set viewport to "Rendered" and you should see the difference.

Thank you!

**System Information** Debian AMD64, fglrx-driver **Blender Version** Broken: 2.72 and later Worked: 2.71 **Short description of error** The GGX distribution on the Glossy, Glass and Refraction Nodes produces black pixels in some cases. The roughness value seems to play a major role. In this picture it produces black lines: ![GGX_bug.png](https://archive.blender.org/developer/F245063/GGX_bug.png) **Exact steps for others to reproduce the error** Here is a .blend file: [GGX_bug.blend](https://archive.blender.org/developer/F245065/GGX_bug.blend) I'm changing the roughness value, which depends on a noise texture modified by the formula e^(x*y). (x is the noise texture value, y=-10) Open in 2.71 and in the current version. Set viewport to "Rendered" and you should see the difference. Thank you!
Author

Changed status to: 'Open'

Changed status to: 'Open'
Author

Added subscriber: @robertm

Added subscriber: @robertm

Added subscriber: @candreacchio

Added subscriber: @candreacchio

Cant reproduce here (2.76 - cfc109e)

Can you confirm that it still happens in 2.76?

Cant reproduce here (2.76 - cfc109e) Can you confirm that it still happens in 2.76?
Author

Yes this still happens with latest git HEAD.
I use CPU renderer on AMD Richland APU.

Yes this still happens with latest git HEAD. I use CPU renderer on AMD Richland APU.

Added subscriber: @MohamedSakr

Added subscriber: @MohamedSakr

I'm using gooseberry 2.75 e96e452 and it works fine on CPU "i7" and CUDA "780, 970" , both supported and experimental mode.

I'm using gooseberry 2.75 e96e452 and it works fine on CPU "i7" and CUDA "780, 970" , both supported and experimental mode.
Author

Hmm..., maybe the issue is OS, or compile flag dependent.
I use the 64 bit Linux builds from buildbot.
@Carlo Andreacchio, @Baron2020 Sakr: what OS are you using?

Hmm..., maybe the issue is OS, or compile flag dependent. I use the 64 bit Linux builds from buildbot. @Carlo Andreacchio, @Baron2020 Sakr: what OS are you using?

win 8 64 bit

win 8 64 bit

Added subscriber: @brecht

Added subscriber: @brecht

I can reproduce it on OS X. There's some special code to deal with very small roughness values since they give float precision issues, that may be failing for some values close to but not quite zero.

I can reproduce it on OS X. There's some special code to deal with very small roughness values since they give float precision issues, that may be failing for some values close to but not quite zero.
Author

I experimented with clamping roughness to zero if it is less than 0.000173.
Seems to work well.
Not a real fix but a workaround.

I experimented with clamping roughness to zero if it is less than 0.000173. Seems to work well. Not a real fix but a workaround.
Sergey Sharybin was assigned by Thomas Dinges 2015-10-23 13:01:30 +02:00

Can't really reproduce the issue. Does it happen with both viewport and final render? Can someone attach system-info.txt from a buggy system?

Can't really reproduce the issue. Does it happen with both viewport and final render? Can someone attach system-info.txt from a buggy system?
Author

@Sergey This happens in both, viewport and final render. Worked fine in 2.71
Here my system-info.txt:
system-info.txt

@Sergey This happens in both, viewport and final render. Worked fine in 2.71 Here my system-info.txt: [system-info.txt](https://archive.blender.org/developer/F254032/system-info.txt)
Member

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger
Member

I cannot reproduce on OS X

I cannot reproduce on OS X

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

I cannot reproduce this on OS X either. I also tested with AVX (instead of AVX2) kernel, and without any SSE at all. Just opened the blend file and started rendering in the Viewport.

I cannot reproduce this on OS X either. I also tested with AVX (instead of AVX2) kernel, and without any SSE at all. Just opened the blend file and started rendering in the Viewport.

I can't reproduce this anymore on OS X. Maybe it's because I upgraded Xcode (clang 3.6 before, 3.7 now). Buildbot builds never had the issue for me.

I can't reproduce this anymore on OS X. Maybe it's because I upgraded Xcode (clang 3.6 before, 3.7 now). Buildbot builds never had the issue for me.

Do you guys mind downloading latest buildbot and testing again?

Do you guys mind downloading latest buildbot and testing again?
Author

I still have this issue with the buildbot version on Debian. buid-date: 2015-11-27 hash: bafccb0

I still have this issue with the buildbot version on Debian. buid-date: 2015-11-27 hash: bafccb0

Wasn't able to reproduce the issue on any of the machines which makes it quite really difficult to troubleshoot.

Thing to try:

  • Grab latest build from builder.blender.org
  • Set Debug Value to 256
  • Go to the Debug panel in the Render context and start toggling CPU options there (after changes in that panel render is to be restarted)
  • See if any of the options has any affect on the render result correctness
Wasn't able to reproduce the issue on any of the machines which makes it quite really difficult to troubleshoot. Thing to try: - Grab latest build from builder.blender.org - Set Debug Value to 256 - Go to the Debug panel in the Render context and start toggling CPU options there (after changes in that panel render is to be restarted) - See if any of the options has any affect on the render result correctness
Author

@Sergey, I tried your suggestion with the Debug panel, tested every combination of SIMD flags, and without any SIMD flags, with and without QBVH, but unfortunately nothing has changed.
I can't test the OpenCL kernels because I have a pre GCN (graphics core next) APU, which is not supported by Blender.
A strange problem really, with no solution in sight. I'll try to test it on some other systems and report back.

@Sergey, I tried your suggestion with the Debug panel, tested every combination of SIMD flags, and without any SIMD flags, with and without QBVH, but unfortunately nothing has changed. I can't test the OpenCL kernels because I have a pre GCN (graphics core next) APU, which is not supported by Blender. A strange problem really, with no solution in sight. I'll try to test it on some other systems and report back.

Yes, testing on another systems will give some hints indeed. One more idea -- try booting some LiveCD on the same machine and see if you can reproduce the issue there.

Yes, testing on another systems will give some hints indeed. One more idea -- try booting some LiveCD on the same machine and see if you can reproduce the issue there.
Author

I tested build b336124 on the same hardware (AMD Llano APU), but different OS:

  • Windows 10: no bug
  • Debian Jessie: bug

Maybe it is a glibc issue?
Is the standard C library statically linked in the buildbot builds?
My libc6 version is 2.19

I tested build b336124 on the same hardware (AMD Llano APU), but different OS: - Windows 10: no bug - Debian Jessie: bug Maybe it is a glibc issue? Is the standard C library statically linked in the buildbot builds? My libc6 version is 2.19

libc is not linked statically, but it's unlikely to be a problem since i did tests on Debian Jessie as well.

Try using some other livnecd linux distro and see how it goes.

libc is not linked statically, but it's unlikely to be a problem since i did tests on Debian Jessie as well. Try using some other livnecd linux distro and see how it goes.

Added subscriber: @mont29

Added subscriber: @mont29

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'

No news since one week…

No news since one week…
Author

Finally I'm able to build Blender myself.
The cause of this issue seems to be optimization done by gcc.

I tested various optimization levels with gcc 4.9:

  • O0 and -Og: OK
  • O1, -O2, -O3, -Ofast: black pixels

"-DNDEBUG -O1 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -O2 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -O3 -march=native -s -flto -ffat-lto-objects": OK
"-DNDEBUG -Ofast -march=native -s -flto -ffat-lto-objects": black pixels

I needed to use -march=native and -ffat-lto-objects to produce link time optimized code. Without these errors occur during a build.

LTO seems to help a bit.
I can't test with gcc 5 because of the C++ ABI change. The upgrade would break my system.

Were compiler version or compile parameters changed between 2.71 and 2.72 ?

Finally I'm able to build Blender myself. The cause of this issue seems to be optimization done by gcc. I tested various optimization levels with gcc 4.9: - O0 and -Og: OK - O1, -O2, -O3, -Ofast: black pixels "-DNDEBUG -O1 -march=native -s -flto -ffat-lto-objects": OK "-DNDEBUG -O2 -march=native -s -flto -ffat-lto-objects": OK "-DNDEBUG -O3 -march=native -s -flto -ffat-lto-objects": OK "-DNDEBUG -Ofast -march=native -s -flto -ffat-lto-objects": black pixels I needed to use -march=native and -ffat-lto-objects to produce link time optimized code. Without these errors occur during a build. LTO seems to help a bit. I can't test with gcc 5 because of the C++ ABI change. The upgrade would break my system. Were compiler version or compile parameters changed between 2.71 and 2.72 ?

There was no changes in optimization flags for quite some time now. However, changes in sources could make heuristics in optimizer to go one way or another.

glibc-219 builds from https://builder.blender.org/download/ are compiled with GCC-5.3, so you might want to test them.

There was no changes in optimization flags for quite some time now. However, changes in sources could make heuristics in optimizer to go one way or another. glibc-219 builds from https://builder.blender.org/download/ are compiled with GCC-5.3, so you might want to test them.
Author

@Sergey, I just tested the gcc5.5 glibc 2.19 builds. There is no improvement unfortunately.

@Sergey, I just tested the gcc5.5 glibc 2.19 builds. There is no improvement unfortunately.
Author

After upgrading to gcc-5 and gcc-6 I was not able to produce correct renderings anymore (with the -flto trick I used in gcc-4.9)
So I looked at the code in intern/cycles/kernel/closure/bsdf_microfacet.h

in line 453 (for glossy shader) I changed:

  if(fmaxf(alpha_x, alpha_y) <= 1e-4f)

to

  if(fmaxf(alpha_x, alpha_y) <= 0.000173f)

and in line 542 (for glass and refractive shader) I changed:

  if(fmaxf(alpha_x, alpha_y) <= 1e-4f || fabsf(m_eta - 1.0f) < 1e-4f)

to

  if(fmaxf(alpha_x, alpha_y) <= 0.000173f || fabsf(m_eta - 1.0f) < 1e-4f)

Smaller values than 0.000173 produce those black pixels.

If someone has an AMD APU + Debian please test this. (I tested on Llano, Trinity and Richland. On all these platforms I have this black pixels issue.)

So I "fixed" it for myself, but does it have a chance to go to master?

After upgrading to gcc-5 and gcc-6 I was not able to produce correct renderings anymore (with the -flto trick I used in gcc-4.9) So I looked at the code in intern/cycles/kernel/closure/bsdf_microfacet.h in line 453 (for glossy shader) I changed: ``` if(fmaxf(alpha_x, alpha_y) <= 1e-4f) ``` to ``` if(fmaxf(alpha_x, alpha_y) <= 0.000173f) ``` and in line 542 (for glass and refractive shader) I changed: ``` if(fmaxf(alpha_x, alpha_y) <= 1e-4f || fabsf(m_eta - 1.0f) < 1e-4f) ``` to ``` if(fmaxf(alpha_x, alpha_y) <= 0.000173f || fabsf(m_eta - 1.0f) < 1e-4f) ``` Smaller values than 0.000173 produce those black pixels. If someone has an AMD APU + Debian please test this. (I tested on Llano, Trinity and Richland. On all these platforms I have this black pixels issue.) So I "fixed" it for myself, but does it have a chance to go to master?

I'm a bit skeptical about just applying such changes. alpha^2 should be well within single floating point precision with current epsilon.

If you're building blender yourself, can you step-by-step corresponding functions and see which expression causes numerical instability perhaps?

I'm a bit skeptical about just applying such changes. `alpha^2` should be well within single floating point precision with current epsilon. If you're building blender yourself, can you step-by-step corresponding functions and see which expression causes numerical instability perhaps?
Author

I made some progress:
After inserting some printfs in the block starting at line 464 I made these observations:

  1. 'D' becomes inf whenever black pixels are produced.

  2. Whenever I wanted to printf 'tanThetaM2' the issue was suddenly gone, probably because of implicit promotion to double 'printf("tanThetaM2=%f\n", (double)tanThetaM2);'

  3. Changing

float tanThetaM2 = 1/(cosThetaM2) - 1;

to

  float tanThetaM2 = (float)((1.0/(double)cosThetaM2) - 1.0);

solves the issue by doing the calculation in double precision mode.

All in all I think there are some unsafe divisions.

I made some progress: After inserting some printfs in the block starting at line 464 I made these observations: 1. 'D' becomes inf whenever black pixels are produced. 2. Whenever I wanted to printf 'tanThetaM2' the issue was suddenly gone, probably because of implicit promotion to double 'printf("tanThetaM2=%f\n", (double)tanThetaM2);' 3. Changing ``` float tanThetaM2 = 1/(cosThetaM2) - 1; ``` to ``` float tanThetaM2 = (float)((1.0/(double)cosThetaM2) - 1.0); ``` solves the issue by doing the calculation in double precision mode. All in all I think there are some unsafe divisions.

Oh, it's alpha^4 in equation for D which is getting quite to a boundary of single precision..

Do you mind trying following thing: replace if(fmaxf(alpha_x, alpha_y) <= 1e-4f) { with if(alpha_x*alpha_y <= 1e-7f) { ?

Oh, it's `alpha^4` in equation for `D` which is getting quite to a boundary of single precision.. Do you mind trying following thing: replace `if(fmaxf(alpha_x, alpha_y) <= 1e-4f) {` with `if(alpha_x*alpha_y <= 1e-7f) {` ?
Author

The new if statement is working for me :)

Do you think it makes sense to replace all (fmaxf(alpha_x, alpha_y) <= 1e-4f) (also for Beckmann?) with the new one?

The new if statement is working for me :) Do you think it makes sense to replace all (fmaxf(alpha_x, alpha_y) <= 1e-4f) (also for Beckmann?) with the new one?

Think that would be reasonable.

@brecht, do you see any downsides of such change? To me it seems to be more robust, but i might be missing something.

Think that would be reasonable. @brecht, do you see any downsides of such change? To me it seems to be more robust, but i might be missing something.

Seems fine to me.

Seems fine to me.

This issue was referenced by blender/cycles@5c8918439f

This issue was referenced by blender/cycles@5c8918439f3ababe8baa89061fda37f92fa61981

This issue was referenced by 7ac126e728

This issue was referenced by 7ac126e728a5b395a00bee6b7f657a843970492b

Changed status from 'Archived' to: 'Resolved'

Changed status from 'Archived' to: 'Resolved'
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
9 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#46492
No description provided.