Page MenuHome

Several double variants for BLI_math

Authored by YimingWu (NicksBest) on Aug 16 2019, 5:24 AM.



These variants are implemented mainly for LANPR to run smoothly with its double precision internal calculations.

Matrix multiply function needs a new __SSE2__ implementation.

Diff Detail

Event Timeline

YimingWu (NicksBest) edited the summary of this revision. (Show Details)Aug 16 2019, 5:28 AM
YimingWu (NicksBest) edited the summary of this revision. (Show Details)

I guess this is just a matter of __m128 -> __m128d and _ps -> _pd everywhere?


B is a float[4][4] mat rather than a double one, so this way it might break _mm_mul_ps

I'm changing my matrix to use double only for simplicity, but some other problems popped up as well. I'm probably going to add a mul_m4_m4m4_db_uniq() where all three matrices are double pricision. Maybe use the SSE instructions only with pure double situations and leave this one without SSE? (Or I remove this variant completely)


Right, it can copy float to double without SSE, and then run the multiplication with doubles only.


OK. I'll make a double copy first and use SSE. Thanks!

updated two finctions mul_m4_m4m4_db_uniq() and mul_m4db_m4db_m4fl_uniq() for __SSE2__ support.

YimingWu (NicksBest) marked an inline comment as done.Aug 18 2019, 8:32 AM
YimingWu (NicksBest) updated this revision to Diff 17239.EditedAug 18 2019, 9:47 AM

__m128d is double[2] instead of double[4], but __m256d needs AVX support. I removed SIMD instructions for those and leave the optimization to the compiler. double version of the matrix is not frequently called, there should be little performance impact on this.

This can be committed to master.

This revision is now accepted and ready to land.Aug 19 2019, 2:57 PM