Page MenuHome

[POC] Kernel: Lattice Deform Performance
AbandonedPublic

Authored by Jeroen Bakker (jbakker) on Sep 2 2020, 5:01 PM.

Details

Reviewers
None
Summary

I am working on a presentation about writing code to boost execution performance. I have chosen lattice_deform as a test-ground for this.
This patch is the result of several experiments to increase the execution performance of lattice deformation.

  1. Adds test-cases to compare the effect with the old implementation. The tests differs by the number of verts to transform and the batch size.
  2. The old implementation calculated one vert, and released static data that can be shared with other verts.
  3. Using branchless code tricks to minimize the branches.
  4. Use phased approach to reduce inner lop complexity.
  5. Use batching to reduce the memory cache demand.

Old implementation

[ RUN      ] lattice_deform_performance.performance_no_dvert_1
[       OK ] lattice_deform_performance.performance_no_dvert_1 (0 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_1000
[       OK ] lattice_deform_performance.performance_no_dvert_1000 (0 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000
[       OK ] lattice_deform_performance.performance_no_dvert_10000 (4 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_100000
[       OK ] lattice_deform_performance.performance_no_dvert_100000 (32 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_1000000
[       OK ] lattice_deform_performance.performance_no_dvert_1000000 (319 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000 (3167 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1 (3197 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10 (3206 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch100
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch100 (3200 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1000 (3202 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10000 (3182 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch100000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch100000 (3165 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1000000 (3153 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10000000 (3204 ms)

Current progress.

[ RUN      ] lattice_deform_performance.performance_no_dvert_1
[       OK ] lattice_deform_performance.performance_no_dvert_1 (0 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_1000
[       OK ] lattice_deform_performance.performance_no_dvert_1000 (1 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000
[       OK ] lattice_deform_performance.performance_no_dvert_10000 (3 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_100000
[       OK ] lattice_deform_performance.performance_no_dvert_100000 (24 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_1000000
[       OK ] lattice_deform_performance.performance_no_dvert_1000000 (199 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000 (1959 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1 (1788 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10 (1768 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch100
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch100 (1732 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1000 (1692 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10000 (1737 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch100000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch100000 (1767 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch1000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch1000000 (1808 ms)
[ RUN      ] lattice_deform_performance.performance_no_dvert_10000000_batch10000000
[       OK ] lattice_deform_performance.performance_no_dvert_10000000_batch10000000 (1949 ms)
NOTE: In order to be more useful the weight should be made per vertex. In stead of using a vec3 we can use a vec4 where the 4th element is the weight (including dvert weight of the target. For this we might need to add more smaller structs so data isn't scattered to much around in memory
NOTE: This is a PoC, in order to actually have benefit for the user it needs more work. This patch is part of a presentation about writing code that run faster.

Diff Detail

Repository
rB Blender
Branch
arcpatch-D8784 (branched from master)
Build Status
Buildable 10056
Build 10056: arc lint + arc unit

Event Timeline

Jeroen Bakker (jbakker) requested review of this revision.Sep 2 2020, 5:01 PM
Jeroen Bakker (jbakker) created this revision.

Done the first round of optimizations.

  • precalc static data
  • remove branches
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Sep 4 2020, 3:47 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Sep 4 2020, 5:02 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
  • First try on implementing batching
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Sep 7 2020, 12:24 PM
  • First try on implementing batching
  • Vectorization
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Sep 8 2020, 4:10 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
  • Enabled new evaluation for particle lattice
Jeroen Bakker (jbakker) retitled this revision from [WIP] Kernel: Lattice Deform Performance to [POC] Kernel: Lattice Deform Performance.Sep 16 2020, 5:00 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)