Page MenuHome

Cycles: use native saturate function for CUDA

Authored by Sv. Lockal (lockal) on Apr 7 2015, 10:47 PM.



nvcc can't optimize clamp(x, 0.0f, 1.0f) into single instruction, and uses 4 instructions instead. By using builtin saturate function we could make code a little bit more clean and optimized.

Diff Detail

rB Blender

Event Timeline

Sv. Lockal (lockal) retitled this revision from to Cycles: use native saturate function for CUDA.Apr 7 2015, 10:47 PM
Sv. Lockal (lockal) updated this object.
Sv. Lockal (lockal) added a project: Cycles.
Sv. Lockal (lockal) updated this revision to Diff 3925.

Common diff in PTX (same for SASS), 92 replacements:

< 	mov.f32 	%f6781, 0f00000000;
< 	max.ftz.f32 	%f6782, %f1500, %f6781;
< 	mov.f32 	%f6783, 0f3F800000;
< 	min.ftz.f32 	%f10110, %f6782, %f6783;
> 	cvt.ftz.sat.f32.f32	%f10029, %f1500;

Generally looks fine, some inlined question about uchar conversion.

It's also a good idea to include some number into the commit message, like how much % of speedup you've got for some benchmark scene.


Did you test if saturating the whole vector and then converting to uchar gives better performance?

This revision was automatically updated to reflect the committed changes.