Floating point exceptions, infinity arithmentic #39772

New Issue

Brecht Van Lommel · 2014-04-17T13:46:54+02:00

jrp commented

2014-04-17 13:46:54 +02:00

(Migrated from localhost:3001)

The attached patch fp.txt includes some code in util_task.cpp that can be adjusted to allow different types of floating point exception to be exposed when Cycles is running, to check for bugs, unwanted behaviour, etc. It can also be used to set the flush to zero and denormals are zero flags. (By default, Cycles runs in /fast fp mode that masks all fp exceptions, so it may be necessary to compile cycles using, eg, /fp:precise, to see anything.)

Different mask or FTZ/DAZ settings may be required for

x86 v x64 (as the latter will use SIMD)
KERNEL_SSE (which obviously uses SIMD)

I have played around a bit with this and find that there is some room for improving the consistency of the way that Cycles treats what would be floating point exceptions if they were not masked.

In a number of places, division by zero happens when running regular test examples. This need not be a problem as the basic operations and math functions all accept infinity and NaN and produce sensible output. Infinities propagate through calculations as one would expect: for example, 2 + Inf; = Inf;, 4/Inf; = 0, atan (Inf) = pi/2. NaN, on the other hand, infects any calculation that involves it. Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN. In comparison operations, positive infinity is larger than all values except itself and NaN, and negative infinity is smaller than all values except itself and NaN. NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN.

Divide by 0 produces +/- INF, except 0/0 which results in NaN.
log of (+/-) 0 produces -INF. log of a negative value (other than -0) produces NaN.
Reciprocal square root (rsq) or square root (sqrt) of a negative number produces NaN. The exception is -0; sqrt(-0) produces -0, and rsq(-0) produces -INF.
INF - INF = NaN
(+/-)INF / (+/-)INF = NaN
(+/-)INF * 0 = NaN
NaN (any OP) any-value = NaN
The comparisons EQ, GT, GE, LT, and LE, when either or both operands is NaN returns FALSE.
Comparisons ignore the sign of 0 (so +0 equals -0).
The comparison NE, when either or both operands is NaN returns TRUE.
Comparisons of any non-NaN value against +/- INF return the correct result.

Wachter [2004] show how a triangle intersection test can be rearranged to produce the right answer even in the presence of NaNs and, indeed, how the floating point comparison can be turned into a faster integer one that also works with Nans and Infs.

At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play.

The specific areas that I thought could be tidied up are set out below. However, I have not just gone ahead and done so because it may well be that the issues found are just the symptom of another problem that I have not identified. Instead, I put in a temporary work-around so that I could continue to test.

when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent.
the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero.
the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt.
in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow
in geom_triangle.h, Dz can be zero
in kernel_compat_cpu.h it is possible to try to assign too large a float to an int in the wrapping and clamping code
in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1
normalizing zero length vectors: what is the right result?
switching on KERNEL_SSE exposes a number of further potential issues, particularly with the comparison operators.

What does the panel think?

The attached patch [fp.txt](https://archive.blender.org/developer/F85072/fp.txt) includes some code in util_task.cpp that can be adjusted to allow different types of floating point exception to be exposed when Cycles is running, to check for bugs, unwanted behaviour, etc. It can also be used to set the flush to zero and denormals are zero flags. (By default, Cycles runs in /fast fp mode that masks all fp exceptions, so it may be necessary to compile cycles using, eg, /fp:precise, to see anything.) Different mask or FTZ/DAZ settings may be required for - x86 v x64 (as the latter will use SIMD) - __KERNEL_SSE__ (which obviously uses SIMD) I have played around a bit with this and find that there is some room for improving the consistency of the way that Cycles treats what would be floating point exceptions if they were not masked. In a number of places, division by zero happens when running regular test examples. This need not be a problem as the basic operations and math functions all accept infinity and NaN and produce sensible output. Infinities propagate through calculations as one would expect: for example, 2 + Inf; = Inf;, 4/Inf; = 0, atan (Inf) = pi/2. NaN, on the other hand, infects any calculation that involves it. Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN. In comparison operations, positive infinity is larger than all values except itself and NaN, and negative infinity is smaller than all values except itself and NaN. NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN. - Divide by 0 produces +/- INF, except 0/0 which results in NaN. - log of (+/-) 0 produces -INF. log of a negative value (other than -0) produces NaN. - Reciprocal square root (rsq) or square root (sqrt) of a negative number produces NaN. The exception is -0; sqrt(-0) produces -0, and rsq(-0) produces -INF. - INF - INF = NaN - (+/-)INF / (+/-)INF = NaN - (+/-)INF * 0 = NaN - NaN (any OP) any-value = NaN - The comparisons EQ, GT, GE, LT, and LE, when either or both operands is NaN returns FALSE. - Comparisons ignore the sign of 0 (so +0 equals -0). - The comparison NE, when either or both operands is NaN returns TRUE. - Comparisons of any non-NaN value against +/- INF return the correct result. Wachter [2004] show how a triangle intersection test can be rearranged to produce the right answer even in the presence of NaNs and, indeed, how the floating point comparison can be turned into a faster integer one that also works with Nans and Infs. At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play. The specific areas that I thought could be tidied up are set out below. However, I have not just gone ahead and done so because it may well be that the issues found are just the symptom of another problem that I have not identified. Instead, I put in a temporary work-around so that I could continue to test. - when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent. - the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero. - the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt. - in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow - in geom_triangle.h, Dz can be zero - in kernel_compat_cpu.h it is possible to try to assign too large a float to an int in the wrapping and clamping code - in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1 - normalizing zero length vectors: what is the right result? - switching on _KERNEL_SSE_ exposes a number of further potential issues, particularly with the comparison operators. What does the panel think?

jrp commented

2014-04-17 13:46:54 +02:00

(Migrated from localhost:3001)

Changed status to: 'Open'

Brecht Van Lommel was assigned by blender-admin

2014-04-17 13:46:54 +02:00

jrp commented

2014-04-17 13:46:54 +02:00

(Migrated from localhost:3001)

Added subscribers: @jrp, @ThomasDinges, @broadstu, @dfelinto, @MartijnBerger, @Lockal

Bastien Montagne commented

2014-04-17 14:06:00 +02:00

Added subscriber: @mont29

Brecht Van Lommel commented

2014-04-17 14:39:49 +02:00

Thanks for looking into this, here is my reasoning for the way the code works now. I didn't check your patch closely yet.

At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play.

I'm not sure I understand, how can you remove the underlying issue? Things like parallel rays / faces are unavoidable. When possible I try to avoid having explicit checks for best performance and prefer to implement things such that there may be a temporary inf, but it should never propagate outside of functions, it's only allowed to be a local thing. Once they propagate you can't control it well anymore.

So I do not consider float division by zero something that needs to be fixed.

when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent.

For the BVH building this would be more convenient, in other cases we want to detect the invalid case rather than assuming the bounding box is zero. I didn't think it was worth creating a separate bounding box class for this case.

the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero.

Did you find that this breaks anything? As far as I know this survives ok in degenerate cases even if there are temporary infinities.

the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt.

I wouldn't say the division by zero is caused by safe_sqrt, it happens also when cosNI is 1 which can happen regardless. ai can be inf, but in the computation of G1i that case is handled so it goes no further as far as I can tell.

in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow

Did you find this breaks anything? It's one of these performance sensitive areas where I prefer to avoid extra checks unless necessary, but I admit this curve intersection code may well cause problems in degenerate cases, I never check that closely, just tested empirically.

in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1

This should have used safe_acosf, I will change that. Not sure why it needs to be scale, these functions should be given normalized vectors, if there's a case where it is not normalized then the caller should do it.

normalizing zero length vectors: what is the right result?

The result should ideally be a zero vector, but this is performance sensitive so I did not add a check for that case in the normalize function.

Thanks for looking into this, here is my reasoning for the way the code works now. I didn't check your patch closely yet. ``` At present, some of the Cycles code tries to avoid relying on such infinity arithmetic by perturbing zero values before divisions, for example. I suspect that a better approach would be to either remove the underlying issue (degenerates, parallel rays / faces, etc) and assume that infinity arithmetic is in play. ``` I'm not sure I understand, how can you remove the underlying issue? Things like parallel rays / faces are unavoidable. When possible I try to avoid having explicit checks for best performance and prefer to implement things such that there may be a temporary inf, but it should never propagate outside of functions, it's only allowed to be a local thing. Once they propagate you can't control it well anymore. So I do not consider float division by zero something that needs to be fixed. ``` when building the bvh, an Empty BoundBox does not pass the "valid" test; the size of an Empty box is -2 * MAX_FLT (-Inf). Why not revise BoundBox so that Empty box goes from 0 to zero, rather than MAX_FLT to -MAX_FLT and avoid all the safe_ members by assuming that -/+ Inf is a possible min/max extent. ``` For the BVH building this would be more convenient, in other cases we want to detect the invalid case rather than assuming the bounding box is zero. I didn't think it was worth creating a separate bounding box class for this case. ``` the binning code assumes that the cent_bounds don't have a zero dimension and that binSize is non-zero. ``` Did you find that this breaks anything? As far as I know this survives ok in degenerate cases even if there are temporary infinities. ``` the microfacet code has a number of areas where a cos of something is generated that can turn out to be > 1. (eg, cosNI = dot(N, *omega_in). Subsequently trying to calculate ai = 1.0f / (m_ab * safe_sqrtf((1.0f - cosNI * cosNI) / (cosNI * cosNI))) results in a division by zero, because of the safe_sqrt. ``` I wouldn't say the division by zero is caused by safe_sqrt, it happens also when cosNI is 1 which can happen regardless. ai can be inf, but in the computation of G1i that case is handled so it goes no further as far as I can tell. ``` in the hair code, geom_curve.h, cubic curves can turn out to be quadratic (p3 close to zero) and calculating cyla can underflow ``` Did you find this breaks anything? It's one of these performance sensitive areas where I prefer to avoid extra checks unless necessary, but I admit this curve intersection code may well cause problems in degenerate cases, I never check that closely, just tested empirically. ``` in kernel_projection.h, dir.z needs to be scaled, to avoid taking the acos of a value > 1 ``` This should have used safe_acosf, I will change that. Not sure why it needs to be scale, these functions should be given normalized vectors, if there's a case where it is not normalized then the caller should do it. ``` normalizing zero length vectors: what is the right result? ``` The result should ideally be a zero vector, but this is performance sensitive so I did not add a check for that case in the normalize function.

jrp commented

2014-04-17 20:06:59 +02:00

(Migrated from localhost:3001)

Added subscriber: @brecht

jrp commented

2014-04-17 20:06:59 +02:00

(Migrated from localhost:3001)

Thanks, @brecht.

I agree the div by zero does not always need to be fixed if, as you say, you are using what I call infinity arithmetic for shorthand. My point it that it is not used uniformly. For example, we have the likes of bvh_clamp_direction which suggested to me that it was not being used. Even where it is being used, the NaN cases may still need to be dealt with. While I can see that you may not want to do that in the Release builds, it may be worth trapping them in debug builds. More generally, it may be worth asserting the preconditions for performance-sensitive routines to aid debugging, and make the code easier to follow.

For example, it's not clear to me whether degenerate cases (eg, 2-D BVH boxes / ranges, 1D triangles, etc) are expected. If they are, could they be handled by faster specializations, or culled? Or, as you say, certain routines (eg, in kernel_projection.h or microfaceting or the acos) expect their inputs to be within a certain range, but they don't always seem to be.

Anyway, the key thing is that there is some code in util_tasks in the patch that allows you to choose to unmask floating point exceptions and change flushing behaviour to reveal corner cases that it may be preferable to handle in a specific way, or which indicate a bug in some other part of the code. It's easy enough to use the render examples in the cycles test suite library to reveal a number of such cases.

Thanks, @brecht. I agree the div by zero does not always need to be fixed if, as you say, you are using what I call infinity arithmetic for shorthand. My point it that it is not used uniformly. For example, we have the likes of bvh_clamp_direction which suggested to me that it was not being used. Even where it is being used, the NaN cases may still need to be dealt with. While I can see that you may not want to do that in the Release builds, it may be worth trapping them in debug builds. More generally, it may be worth asserting the preconditions for performance-sensitive routines to aid debugging, and make the code easier to follow. For example, it's not clear to me whether degenerate cases (eg, 2-D BVH boxes / ranges, 1D triangles, etc) are expected. If they are, could they be handled by faster specializations, or culled? Or, as you say, certain routines (eg, in kernel_projection.h or microfaceting or the acos) expect their inputs to be within a certain range, but they don't always seem to be. Anyway, the key thing is that there is some code in util_tasks in the patch that allows you to choose to unmask floating point exceptions and change flushing behaviour to reveal corner cases that it may be preferable to handle in a specific way, or which indicate a bug in some other part of the code. It's easy enough to use the render examples in the cycles test suite library to reveal a number of such cases.

Brecht Van Lommel commented

2014-04-18 12:33:39 +02:00

Infinity arithmetic is used locally in a few places, and indeed I wouldn't mind that being made more explicit, and adding asserts in various places to check if things are still valid.

Degenerate cases are expected sometimes, 2D bounding boxes are valid, but 1D triangles could be culled because they are useless. Specialized code to handle them can be ok, but in performance sensitive code branching is costly so we need to try to exploit infinity arithmetic or other tricks there when possible.

Anyway, to move this patch forward, if this is going to be committed we need fixes only for the cases that cause problems, and leave the changes where infinity arithmetic works fine, except for perhaps some comments or asserts.

Infinity arithmetic is used locally in a few places, and indeed I wouldn't mind that being made more explicit, and adding asserts in various places to check if things are still valid. Degenerate cases are expected sometimes, 2D bounding boxes are valid, but 1D triangles could be culled because they are useless. Specialized code to handle them can be ok, but in performance sensitive code branching is costly so we need to try to exploit infinity arithmetic or other tricks there when possible. Anyway, to move this patch forward, if this is going to be committed we need fixes only for the cases that cause problems, and leave the changes where infinity arithmetic works fine, except for perhaps some comments or asserts.

jrp commented

2014-04-18 14:20:04 +02:00

(Migrated from localhost:3001)

Thanks. Let's take it case-by-case:

In triangle_refine, it is possible for the ray to be parallel to the triangle (and so InvDz turns to Inf) and the Inf will get propagated back to shader_setup_from_ray.

This looks as if it breaks one of your precepts.

Thanks. Let's take it case-by-case: In triangle_refine, it is possible for the ray to be parallel to the triangle (and so InvDz turns to Inf) and the Inf will get propagated back to shader_setup_from_ray. This looks as if it breaks one of your precepts.

Brecht Van Lommel commented

2014-04-18 14:28:38 +02:00

Rays parallel to triangles are possible. However I think triangle_intersect will catch that case with the if(t > 0.0f && t < isect->t) check?

There is the triangle_refine case still where it does not check, however that only happens once triangle_intersect has succeeded with the same v00 and ray->D so it seems safe to me.

Rays parallel to triangles are possible. However I think `triangle_intersect` will catch that case with the `if(t > 0.0f && t < isect->t)` check? There is the `triangle_refine` case still where it does not check, however that only happens once `triangle_intersect` has succeeded with the same `v00` and `ray->D` so it seems safe to me.

Brecht Van Lommel commented

2014-11-09 15:18:30 +01:00

Changed status from 'Open' to: 'Archived'

Brecht Van Lommel closed this issue

2014-11-09 15:18:30 +01:00

Brecht Van Lommel commented

2014-11-09 15:18:30 +01:00

Closing this since there seems to be no progress here.

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Floating point exceptions, infinity arithmentic #39772