Page MenuHome

Cycles: Use curve approximation for blackbody instead of lookup table
ClosedPublic

Authored by Sv. Lockal (lockal) on May 4 2015, 5:33 PM.

Details

Summary

Now we calculate color in range 800..12000 using an approximation a/x+bx+c for R and G and ((at + b)t + c)t + d) for B.
Max absolute error for RGB for non-lut function is less than 0.0001, which is enough to get the same 8 bit/channel color as for OSL with a noticeable performance difference.
However there is a slight visible difference between previous non-OSL implementation because of lookup table interpolation and offset-by-one mistake.
The previous implementation gave black color outside of soft range (t > 12000), now it gives the same color as for 12000.

Also blackbody node without input connected is being converted to value input at shader compile time.

Diff Detail

Repository
rB Blender
Branch
cycles_blackbody_approx

Event Timeline

Sv. Lockal (lockal) retitled this revision from to Cycles: Use curve approximation for blackbody instead of lookup table.May 4 2015, 5:33 PM
Sv. Lockal (lockal) updated this object.
Sv. Lockal (lockal) updated this revision to Diff 4151.

I really like it.

Can we get some performance numbers ? I am sold on the idea already but still good to know what it does performance wise.

Code looks good apart from one minor thing.

intern/cycles/kernel/kernel_types.h
987

Needs a pad3 now.

Generally seems fine, but did someone compare ptex output of CUDA kernel / measure best case performance gain (rendering plane with blackbody shader which covers whole camera space)?

intern/cycles/kernel/svm/svm_math_util.h
141

There's no space between keyword and bracket in Cycles code style.

Difference between original (non-lut) function and approximation:

There are no visible difference for RGB channels:

Tab-delimited file generator: P222

Added @Sergey Sharybin (sergey) as reviewer, because the diff is similar to D1219.

Added @Thomas Dinges (dingto) as reviewer as an author of original implementation in Cycles.

Sv. Lockal (lockal) added a comment.EditedMay 4 2015, 5:50 PM

Quick test on CPU with plane gave 25.45s (lut) vs 20.26s (approx). Input socket is obviously connected.

For CUDA (7.0) the difference is smaller, 00:52.86 (lut) and 00:51:91 (approx).

3 benchmarks (cpu)

images

@Sv. Lockal (lockal), once the padding and style is fixed think it'll be fine for commit.

Thomas Dinges (dingto) accepted this revision.

Agreed. Great work! :)

This revision is now accepted and ready to land.May 4 2015, 8:32 PM

the fire render in my test looks different - issue?

Hm, right, the two images in nutel's zip archive have a visible difference.

@Thomas Dinges (dingto), it seems to me that current implementation is affected by "off-by-one" mistake somewhere around here: https://developer.blender.org/diffusion/B/browse/master/intern/cycles/render/blackbody.cpp$96 . I've already noticed that our current implementation gives different colors from OSL and my approximation is based on unbiased function.

Example color for t=4605:

cpu/gpu lutcpu/gpu approxOSL
1.297521.280411.28133
0.946090.949260.94899
0.657980.676610.67706

Right, the current implementation does not match OSL perfectly, I couldn't get it closer to the OSL one while working on it. I always suspected some issues with the lookup table offset, but didn't investigate it further. :/

I would take the OSL implementation as a reference, and as your approx version matches it more closely, that is fine.

Feel free to commit it. :)

Sv. Lockal (lockal) updated this revision to Diff 4152.

Fix if() formatting and add pad3.

This revision was automatically updated to reflect the committed changes.