Floating point exceptions, infinity arithmentic #39772

Closed
opened 2014-04-17 13:46:54 +02:00 by jrp · 11 comments
jrp commented 2014-04-17 13:46:54 +02:00 (Migrated from localhost:3001)

The attached patch fp.txt includes some code in util_task.cpp that can be adjusted to allow different types of floating point exception to be exposed when Cycles is running, to check for bugs, unwanted behaviour, etc. It can also be used to set the flush to zero and denormals are zero flags. (By default, Cycles runs in /fast fp mode that masks all fp exceptions, so it may be necessary to compile cycles using, eg, /fp:precise, to see anything.)

Different mask or FTZ/DAZ settings may be required for

  • x86 v x64 (as the latter will use SIMD)

  • KERNEL_SSE (which obviously uses SIMD)

I have played around a bit with this and find that there is some room for improving the consistency of the way that Cycles treats what would be floating point exceptions if they were not masked.

In a number of places, division by zero happens when running regular test examples. This need not be a problem as the basic operations and math functions all accept infinity and NaN and produce sensible output. Infinities propagate through calculations as one would expect: for example, 2 + Inf; = Inf;, 4/Inf; = 0, atan (Inf) = pi/2. NaN, on the other hand, infects any calculation that involves it. Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN. In comparison operations, positive infinity is larger than all values except itself and NaN, and negative infinity is smaller than all values except itself and NaN. NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN.

  • Divide by 0 produces +/- INF, except 0/0 which results in NaN.
  • log of (+/-) 0 produces -INF. log of a negative value (other than -0) produces NaN.
  • Reciprocal square root (rsq) or square root (sqrt) of a negative number produces NaN. The exception is -0; sqrt(-0) produces -0, and rsq(-0) produces -INF.
  • INF - INF = NaN
  • (+/-)INF / (+/-)INF = NaN
  • (+/-)INF * 0 = NaN
  • NaN (any OP) any-value = NaN
  • The comparisons EQ, GT, GE, LT, and LE, when either or both operands is NaN returns FALSE.
  • Comparisons ignore the sign of 0 (so +0 equals -0).
  • The comparison NE, when either or both operands is NaN returns TRUE.
  • Comparisons of any non-NaN value against +/- INF return the correct result.

Wachter [2004] show how a triangle intersection test can be rearranged to produce the right answer even in the presence of NaNs and, indeed, how the floating point comparison can be turned into a faster integer one that also works with Nans and Infs.

At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play.

The specific areas that I thought could be tidied up are set out below. However, I have not just gone ahead and done so because it may well be that the issues found are just the symptom of another problem that I have not identified. Instead, I put in a temporary work-around so that I could continue to test.

  • when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent.

  • the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero.

  • the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt.

  • in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow

  • in geom_triangle.h, Dz can be zero

  • in kernel_compat_cpu.h it is possible to try to assign too large a float to an int in the wrapping and clamping code

  • in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1

  • normalizing zero length vectors: what is the right result?

  • switching on KERNEL_SSE exposes a number of further potential issues, particularly with the comparison operators.

What does the panel think?

The attached patch [fp.txt](https://archive.blender.org/developer/F85072/fp.txt) includes some code in util_task.cpp that can be adjusted to allow different types of floating point exception to be exposed when Cycles is running, to check for bugs, unwanted behaviour, etc. It can also be used to set the flush to zero and denormals are zero flags. (By default, Cycles runs in /fast fp mode that masks all fp exceptions, so it may be necessary to compile cycles using, eg, /fp:precise, to see anything.) Different mask or FTZ/DAZ settings may be required for - x86 v x64 (as the latter will use SIMD) - __KERNEL_SSE__ (which obviously uses SIMD) I have played around a bit with this and find that there is some room for improving the consistency of the way that Cycles treats what would be floating point exceptions if they were not masked. In a number of places, division by zero happens when running regular test examples. This need not be a problem as the basic operations and math functions all accept infinity and NaN and produce sensible output. Infinities propagate through calculations as one would expect: for example, 2 + Inf; = Inf;, 4/Inf; = 0, atan (Inf) = pi/2. NaN, on the other hand, infects any calculation that involves it. Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN. In comparison operations, positive infinity is larger than all values except itself and NaN, and negative infinity is smaller than all values except itself and NaN. NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN. - Divide by 0 produces +/- INF, except 0/0 which results in NaN. - log of (+/-) 0 produces -INF. log of a negative value (other than -0) produces NaN. - Reciprocal square root (rsq) or square root (sqrt) of a negative number produces NaN. The exception is -0; sqrt(-0) produces -0, and rsq(-0) produces -INF. - INF - INF = NaN - (+/-)INF / (+/-)INF = NaN - (+/-)INF * 0 = NaN - NaN (any OP) any-value = NaN - The comparisons EQ, GT, GE, LT, and LE, when either or both operands is NaN returns FALSE. - Comparisons ignore the sign of 0 (so +0 equals -0). - The comparison NE, when either or both operands is NaN returns TRUE. - Comparisons of any non-NaN value against +/- INF return the correct result. Wachter [2004] show how a triangle intersection test can be rearranged to produce the right answer even in the presence of NaNs and, indeed, how the floating point comparison can be turned into a faster integer one that also works with Nans and Infs. At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play. The specific areas that I thought could be tidied up are set out below. However, I have not just gone ahead and done so because it may well be that the issues found are just the symptom of another problem that I have not identified. Instead, I put in a temporary work-around so that I could continue to test. - when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent. - the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero. - the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt. - in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow - in geom_triangle.h, Dz can be zero - in kernel_compat_cpu.h it is possible to try to assign too large a float to an int in the wrapping and clamping code - in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1 - normalizing zero length vectors: what is the right result? - switching on _KERNEL_SSE_ exposes a number of further potential issues, particularly with the comparison operators. What does the panel think?
jrp commented 2014-04-17 13:46:54 +02:00 (Migrated from localhost:3001)
Author

Changed status to: 'Open'

Changed status to: 'Open'
Brecht Van Lommel was assigned by blender-admin 2014-04-17 13:46:54 +02:00
jrp commented 2014-04-17 13:46:54 +02:00 (Migrated from localhost:3001)
Author

Added subscribers: @jrp, @ThomasDinges, @broadstu, @dfelinto, @MartijnBerger, @Lockal

Added subscribers: @jrp, @ThomasDinges, @broadstu, @dfelinto, @MartijnBerger, @Lockal

Added subscriber: @mont29

Added subscriber: @mont29

Thanks for looking into this, here is my reasoning for the way the code works now. I didn't check your patch closely yet.

At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play.

I'm not sure I understand, how can you remove the underlying issue? Things like parallel rays / faces are unavoidable. When possible I try to avoid having explicit checks for best performance and prefer to implement things such that there may be a temporary inf, but it should never propagate outside of functions, it's only allowed to be a local thing. Once they propagate you can't control it well anymore.

So I do not consider float division by zero something that needs to be fixed.

when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent.

For the BVH building this would be more convenient, in other cases we want to detect the invalid case rather than assuming the bounding box is zero. I didn't think it was worth creating a separate bounding box class for this case.

the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero.

Did you find that this breaks anything? As far as I know this survives ok in degenerate cases even if there are temporary infinities.

the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt.

I wouldn't say the division by zero is caused by safe_sqrt, it happens also when cosNI is 1 which can happen regardless. ai can be inf, but in the computation of G1i that case is handled so it goes no further as far as I can tell.

in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow

Did you find this breaks anything? It's one of these performance sensitive areas where I prefer to avoid extra checks unless necessary, but I admit this curve intersection code may well cause problems in degenerate cases, I never check that closely, just tested empirically.

in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1

This should have used safe_acosf, I will change that. Not sure why it needs to be scale, these functions should be given normalized vectors, if there's a case where it is not normalized then the caller should do it.

normalizing zero length vectors: what is the right result?

The result should ideally be a zero vector, but this is performance sensitive so I did not add a check for that case in the normalize function.

Thanks for looking into this, here is my reasoning for the way the code works now. I didn't check your patch closely yet. ``` At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play. ``` I'm not sure I understand, how can you remove the underlying issue? Things like parallel rays / faces are unavoidable. When possible I try to avoid having explicit checks for best performance and prefer to implement things such that there may be a temporary inf, but it should never propagate outside of functions, it's only allowed to be a local thing. Once they propagate you can't control it well anymore. So I do not consider float division by zero something that needs to be fixed. ``` when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent. ``` For the BVH building this would be more convenient, in other cases we want to detect the invalid case rather than assuming the bounding box is zero. I didn't think it was worth creating a separate bounding box class for this case. ``` the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero. ``` Did you find that this breaks anything? As far as I know this survives ok in degenerate cases even if there are temporary infinities. ``` the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt. ``` I wouldn't say the division by zero is caused by safe_sqrt, it happens also when cosNI is 1 which can happen regardless. ai can be inf, but in the computation of G1i that case is handled so it goes no further as far as I can tell. ``` in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow ``` Did you find this breaks anything? It's one of these performance sensitive areas where I prefer to avoid extra checks unless necessary, but I admit this curve intersection code may well cause problems in degenerate cases, I never check that closely, just tested empirically. ``` in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1 ``` This should have used safe_acosf, I will change that. Not sure why it needs to be scale, these functions should be given normalized vectors, if there's a case where it is not normalized then the caller should do it. ``` normalizing zero length vectors: what is the right result? ``` The result should ideally be a zero vector, but this is performance sensitive so I did not add a check for that case in the normalize function.
jrp commented 2014-04-17 20:06:59 +02:00 (Migrated from localhost:3001)
Author

Added subscriber: @brecht

Added subscriber: @brecht
jrp commented 2014-04-17 20:06:59 +02:00 (Migrated from localhost:3001)
Author

Thanks, @brecht.

I agree the div by zero does not always need to be fixed if, as you say, you are using what I call infinity arithmetic for shorthand. My point it that it is not used uniformly. For example, we have the likes of bvh_clamp_direction which suggested to me that it was not being used. Even where it is being used, the NaN cases may still need to be dealt with. While I can see that you may not want to do that in the Release builds, it may be worth trapping them in debug builds. More generally, it may be worth asserting the preconditions for performance-sensitive routines to aid debugging, and make the code easier to follow.

For example, it's not clear to me whether degenerate cases (eg, 2-D BVH boxes / ranges, 1D triangles, etc) are expected. If they are, could they be handled by faster specializations, or culled? Or, as you say, certain routines (eg, in kernel_projection.h or microfaceting or the acos) expect their inputs to be within a certain range, but they don't always seem to be.

Anyway, the key thing is that there is some code in util_tasks in the patch that allows you to choose to unmask floating point exceptions and change flushing behaviour to reveal corner cases that it may be preferable to handle in a specific way, or which indicate a bug in some other part of the code. It's easy enough to use the render examples in the cycles test suite library to reveal a number of such cases.

Thanks, @brecht. I agree the div by zero does not always need to be fixed if, as you say, you are using what I call infinity arithmetic for shorthand. My point it that it is not used uniformly. For example, we have the likes of bvh_clamp_direction which suggested to me that it was not being used. Even where it is being used, the NaN cases may still need to be dealt with. While I can see that you may not want to do that in the Release builds, it may be worth trapping them in debug builds. More generally, it may be worth asserting the preconditions for performance-sensitive routines to aid debugging, and make the code easier to follow. For example, it's not clear to me whether degenerate cases (eg, 2-D BVH boxes / ranges, 1D triangles, etc) are expected. If they are, could they be handled by faster specializations, or culled? Or, as you say, certain routines (eg, in kernel_projection.h or microfaceting or the acos) expect their inputs to be within a certain range, but they don't always seem to be. Anyway, the key thing is that there is some code in util_tasks in the patch that allows you to choose to unmask floating point exceptions and change flushing behaviour to reveal corner cases that it may be preferable to handle in a specific way, or which indicate a bug in some other part of the code. It's easy enough to use the render examples in the cycles test suite library to reveal a number of such cases.

Infinity arithmetic is used locally in a few places, and indeed I wouldn't mind that being made more explicit, and adding asserts in various places to check if things are still valid.

Degenerate cases are expected sometimes, 2D bounding boxes are valid, but 1D triangles could be culled because they are useless. Specialized code to handle them can be ok, but in performance sensitive code branching is costly so we need to try to exploit infinity arithmetic or other tricks there when possible.

Anyway, to move this patch forward, if this is going to be committed we need fixes only for the cases that cause problems, and leave the changes where infinity arithmetic works fine, except for perhaps some comments or asserts.

Infinity arithmetic is used locally in a few places, and indeed I wouldn't mind that being made more explicit, and adding asserts in various places to check if things are still valid. Degenerate cases are expected sometimes, 2D bounding boxes are valid, but 1D triangles could be culled because they are useless. Specialized code to handle them can be ok, but in performance sensitive code branching is costly so we need to try to exploit infinity arithmetic or other tricks there when possible. Anyway, to move this patch forward, if this is going to be committed we need fixes only for the cases that cause problems, and leave the changes where infinity arithmetic works fine, except for perhaps some comments or asserts.
jrp commented 2014-04-18 14:20:04 +02:00 (Migrated from localhost:3001)
Author

Thanks. Let's take it case-by-case:

In triangle_refine, it is possible for the ray to be parallel to the triangle (and so InvDz turns to Inf) and the Inf will get propagated back to shader_setup_from_ray.

This looks as if it breaks one of your precepts.

Thanks. Let's take it case-by-case: In triangle_refine, it is possible for the ray to be parallel to the triangle (and so InvDz turns to Inf) and the Inf will get propagated back to shader_setup_from_ray. This looks as if it breaks one of your precepts.

Rays parallel to triangles are possible. However I think triangle_intersect will catch that case with the if(t > 0.0f && t < isect->t) check?

There is the triangle_refine case still where it does not check, however that only happens once triangle_intersect has succeeded with the same v00 and ray->D so it seems safe to me.

Rays parallel to triangles are possible. However I think `triangle_intersect` will catch that case with the `if(t > 0.0f && t < isect->t)` check? There is the `triangle_refine` case still where it does not check, however that only happens once `triangle_intersect` has succeeded with the same `v00` and `ray->D` so it seems safe to me.

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'

Closing this since there seems to be no progress here.

Closing this since there seems to be no progress here.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#39772
No description provided.