Page MenuHome

Cycles: use native saturate function for CUDA
ClosedPublic

Authored by Sv. Lockal (lockal) on Apr 7 2015, 10:47 PM.

Details

Summary

nvcc can't optimize clamp(x, 0.0f, 1.0f) into single instruction, and uses 4 instructions instead. By using builtin saturate function we could make code a little bit more clean and optimized.

Diff Detail

Repository
rB Blender

Event Timeline

Sv. Lockal (lockal) retitled this revision from to Cycles: use native saturate function for CUDA.Apr 7 2015, 10:47 PM
Sv. Lockal (lockal) updated this object.
Sv. Lockal (lockal) added a project: Cycles.
Sv. Lockal (lockal) updated this revision to Diff 3925.

Common diff in PTX (same for SASS), 92 replacements:

< 	mov.f32 	%f6781, 0f00000000;
< 	max.ftz.f32 	%f6782, %f1500, %f6781;
< 	mov.f32 	%f6783, 0f3F800000;
< 	min.ftz.f32 	%f10110, %f6782, %f6783;
---
> 	cvt.ftz.sat.f32.f32	%f10029, %f1500;

Generally looks fine, some inlined question about uchar conversion.

It's also a good idea to include some number into the commit message, like how much % of speedup you've got for some benchmark scene.

intern/cycles/kernel/kernel_film.h
40

Did you test if saturating the whole vector and then converting to uchar gives better performance?

This revision was automatically updated to reflect the committed changes.