Page MenuHome

WIP: Cycles: merge primitive information to improve memory access
Needs ReviewPublic

Authored by Sv. Lockal (lockal) on Aug 5 2014, 5:43 PM.

Details

Summary

Target speedup: 10-12%.

My Ubuntu installation is not in a good shape. Ubuntu 14.10 with shiny 3.16 kernel, Cuda 6.5, GCC 4.9. But strange thing is going on with cycles. For example, BMW scene for master branch is now 2:10 for me (previously it was 1:40). Unfortunately, Intel profiler does not work with my current setup, so I am not trying to find the root of this problem. Still, this patch improves the performance of Cycles up to 15% for various files.

Cuda performance is yet to be checked for this patch.

Diff Detail

Event Timeline

Sv. Lockal (lockal) retitled this revision from to Cycles: merge primitive information to improve memory access.
Sv. Lockal (lockal) updated this object.
intern/cycles/bvh/bvh.cpp
207

This copy is unused. All pack_nodes() virtual overrides don't use the first parameter.

Sv. Lockal (lockal) retitled this revision from Cycles: merge primitive information to improve memory access to WIP: Cycles: merge primitive information to improve memory access.Aug 5 2014, 6:55 PM
intern/cycles/kernel/geom/geom_bvh_shadow.h
209

kernel_tex_fetch(__prim_info, primAddr).x/y/z/w should probably be replaced with kernel_fetch_prim_type(primAddr) , kernel_fetch_prim_visibility(primAddr), etc.

Sv. Lockal (lockal) added a comment.EditedAug 5 2014, 8:27 PM

Tables for common benchmark files:

koro_final
patched2:32.70
vanilla2:43.04
barcelone 1.4
patched01:25.32
vanilla01:24.80
BMW1M-MikePantiles 120*67
patched01:37.58
vanilla01:50.42
BMW1M-MikePantiles 12*12
patched01:39.68
vanilla01:40.92
cycles_cornell_bench270*270, half of threads stay idle
patched09:33.17 (?!, should check cpu time here)
vanilla07:28.60
cycles_cornell_bench16x16 tiles
patched05:13.26
vanilla05:18.11

Hardware: Intel Core i7-4771 (Haswell), Gigabyte GA-Z87P-D3, Kingston 9905403-831.A00LF 8GB x 2

Software: Ubuntu 14.10, Linux Kernel 3.16, GCC 4.9

Bao 2 (bao2) removed a subscriber: Bao 2 (bao2).
Bao 2 (bao2) added a comment.EditedAug 6 2014, 4:50 PM

I tested this diff and it renders at same speed in CUDA than before, and in CPU also same speed in my old amd quad core.

So because I don't see any loss in speed and because the new code is smaller I would commit it.

But instead to have these ".x" ".y" ".z" ".w" in the code, I would change the

struct ccl_try_align(16) int4 {

in util_types.h so it has this union instead:

union {

		__m128i m128;
		struct { int x, y, z, w; };
		struct { int type, visibility, index, object; };

};

and then to change all those ".x" ".y" ".z" ".w" using the names ".type" ".visibility", etc that makes more readable the code.

Is this patch still relevant after all those optimizations and changes being done?