Page MenuHome

sRGB OETF: Unused and wasted performance via the SPI sRGB OETF
Closed, ArchivedPublic

Description

With the addition of the sRGB inverse transform, Blender is building upon an entirely unused set of values inherited from the cannibalization of the SPI transform.

Given that Blender will not and can not use values stored to display referred formats that extend into negatives or over values of 1.0 because of the inability to store transforms into EXR files, the vast majority of the gargantuan custom SPI sRGB transform is unused and wasted. Worse, given the numerical quantisation range, the linear to nonlinear transform is inaccurate, representing a mere 1620 values.

That is, of the 65531 values, 1620 are only utilised by Blender, or roughly 2.47% of the entire lookup.

To repair this, replace the sRGB transform with a proper sRGB OETF that defines only the 0.0 to 1.0 range at a high grade 4096 set of intervals, and remove the inverse. The speedup and quality difference achieved should offset the need for an inverse LUT transform.

Example sRGB OETF that would be plug and play https://github.com/sobotka/filmic-blender/blob/master/luts/sRGB_OETF_to_Linear.spi1d.

The following is a Google Spreadsheet visualizing the extent of this transform. The yellow represents values that are impossible to utilize and that represent non-standard SPI specific value ranges. https://docs.google.com/spreadsheets/d/1UMIieLJguMDXsSq6-XbYl1o9CV9xaLnxsUgaht58LeA/edit?usp=sharing

Details

Type
Bug

Event Timeline

Sergey Sharybin (sergey) closed this task as Archived.
Sergey Sharybin (sergey) claimed this task.

I don't know what is the point here. You are basically clipping the working range from somewhat dynamic range to 0..1 without any obvious need of that.

I have no idea why you consider sRGB OETF the way to go, we always used this transform [1] and it is still used in Cycles and all GLSL code. It perfectly supports all the dynamic ranged.

Other points here are:

  • You mention out current transform is inaccurate. Did you make any comparison with yours? What is the mean error, what is the maximum error? Which one is closer to the formula-calculated value?
  • While inverted lookup is 2x faster with 4k numbers compared to 64k, it is still 8 steps of binary search. This is still much slower than direct lookup (direct lookup does not involve any cycles or so, keeps maximum thread and cache coherency).

I also don't see how it is a bug in Blender, there was never measurable difference from transform denoted by [1]. If yout transform has no measurable changes, then it'll only introduce slowdown to linear->srgb conversion which i'm not going to accept.
If there is something measurable, then proposed change should be reforumlated in terms of preserving compatibility with files.

In any case, this report does not belong to bug tracker. Feel free to continue discussion in the ML, explaining exactly what is wrong and giving some numbers to show where inaccuracy is coming from.

[1] https://en.wikipedia.org/wiki/SRGB#Specification_of_the_transformation

Ok, sorry for spam, but here are some calculations.

First of all, make it 12 steps, not 8 in the comment above.

1620 values gives you step of 6e-4f which i believe translates to max absolute error of 1e-7 on the 0..1 range compared to the formula we used before. This figure is on the boundary of machine epsilon, an something which will be totally negelable in any byte or half-float file format.

What that means? Well, maybe:

  • You can't really increase accuracy in a measurable way.
  • You can probably use like 2048 or 1024 values for 0..1 range, but that would still need 11 or 10 steps (respectively) for inverse lookup. So doesn't really help here.
  • The only effective part of the original suggestion would be the clip to 0 to 1 range for sRGB space. Which i don't see why we should see that as a benefit.

Span again!

Just to state it explicitly. Original 64k values requires 16 steps of binary search during inverse lookup. 4k values would make it 12 steps, which is like only 25% faster.You can maybe make it 2x faster before you start measure undesirable error. But it is still much much more CPU ticks than with inversed table which only needs couple of if statements followed by interpolation (interpolation is common for both directions btw, so effectively you are replacing 8-12 loop iterations with 2 condition statements).