This gives a minor speedup (~5%) for viewport render for simple scenes
Based in the fact, that no rotation/shear transformations are used with rastertocamera transform, and perspective coefficient is never zero (how could it be?).
Also it looks like transform_perspective in msvc is affected by a similar problem, that was fixed in rBbf11e362c5418ec40dc0437d329efceb225eb5ef. Anyway, this patch replaces full matrix transformation with 2 fma's and removes 3 divisions from critical path (division unit of sandy bridge and haswell arch is overloaded for camera_sample).
Several things here:
- This code is run once per pixel per sample, so it's really weird that the change gives 5% speedup.
- I don't see any measurable difference on both CPU and GPU here.
- Cycles api is actually prefers to use float2/float3/float4 for arguments communication instead of passing separate arguments
- Is this 2d3d thing something commonly used to denote transformation related on only translation/scale?
- move is something new term, guess it should be translation ?