Page MenuHome

Cycles: AVX implantation of Perlin noise.
Needs ReviewPublic

Authored by Omar Emara (OmarSquircleArt) on Mon, Jan 27, 9:49 AM.

Details

Summary

This patch adds an AVX implementation of Perlin noise in Cycles.
An avxi type was also added as a utility based on the respective
type in Intel Embree.

Only 3D and 4D noise were implemented, there is no benefit for
utilizing AVX in 1D and 2D noise. The SSE trilinear interpolation
function was used in the AVX implementation because there is no
benefit from using AVX in interpolating the last three dimensions.

I couldn't measure any actual performance gains on a Zen1 CPU.
It could be that the extra setup cost canceled with any gains.
But any pointers as to why this is the case would be appreciated.

Diff Detail

Repository
rB Blender
Branch
avx-perlin-noise
Build Status
Buildable 6633
Build 6633: arc lint + arc unit

Event Timeline

@Max (maxim_d33), someyhing interesting for you to have a look? :)

@Max (maxim_d33), someyhing interesting for you to have a look? :)

indeed, thanks for including.

can we agree how do we run ?
so we can better talk about performance etc...

I am not sure if there is a better way to measure this. But I use the following scene. About 40% of the CPU cycles are spent in the noise code.


I also used the Class Room scene Blender demo as a production test.

thanks @Omar Emara (OmarSquircleArt)

do you have any compilation warrning/error ?

with VS2019 I see

31>D:\blender-git\blender\intern\cycles\util/util_avxf.h(269): warning C4002: too many arguments for function-like macro invocation '_mm256_cvtss_f32' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(473): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(474): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(507): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(508): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(541): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(542): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\filter_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxf.h(269): warning C4002: too many arguments for function-like macro invocation '_mm256_cvtss_f32' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxf.h(269): warning C4002: too many arguments for function-like macro invocation '_mm256_cvtss_f32' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(473): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(474): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(473): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(474): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(507): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(508): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(541): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(542): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_split_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(507): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(508): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(541): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'ccl::avxb' (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)
31>D:\blender-git\blender\intern\cycles\util/util_avxi.h(542): note: No constructor could take the source type, or constructor overload resolution was ambiguous (compiling source file D:\blender-git\blender\intern\cycles\kernel\kernels\cpu\kernel_avx.cpp)

@Max (maxim_d33) No. I compile with clang 9. I will investigate.

Ok. My bad, it turned out I was compiling with WITH_CYCLES_NATIVE_ONLY. Will update the patch.

  • Merge branch 'master' into avx-perlin-noise
  • Fix avxi for AVX platforms.

now it is compilable with VS2019 and beside stated questions - seems to be quite reasonable to use.

intern/cycles/util/util_avxi.h
16

where does it come from ?

30

do you really need to have this for an union (and other KERNEL_AVX2checks ) in this file?

intern/cycles/util/util_avxi.h
16

It is from the Intel Embree library. From here:

https://github.com/embree/embree-renderer/tree/master/common/simd

30

What do you propose? Do you want to put those into a separate file?

Max (maxim_d33) added inline comments.Mon, Feb 3, 4:27 PM
intern/cycles/util/util_avxi.h
30

skip it as not needed

avxi should be used with KERNEL_AVX2 at full
or based on __m128i / complete not used , if below.

intern/cycles/util/util_avxi.h
30

Not sure I follow.
We currently use AVX2 at full if KERNEL_AVX2 is defined. The low and high __m128i types in the union only comes at play when AVX is defined but AVX2 isn't.

Max (maxim_d33) added inline comments.Mon, Feb 3, 5:10 PM
intern/cycles/util/util_avxi.h
30

so you want to have always

__m256i m256;

but only sometime

__m128i l, h;

?

intern/cycles/util/util_avxi.h
30

Yes.
AVX already contains some instructions to act on __m256i. So we make use of them in AVX. For the other instructions that are only supported on AVX2, we emulate it using AVX/SSE instructions.

intern/cycles/util/util_avxi.h
30

I would leave m128i to be declared always
m256i can be questionable if not KERNEL_AVX2 and here I would focus on "emulation" , as you said above

intern/cycles/util/util_avxi.h
30

So you just want to remove the preprocessor directive from the union?

Why is m256i questionable if not KERNEL_AVX2? There is no penalty from SSE-AVX mixing because all instructions will be VEX.

I can measure about 10% speedup on my Xeon E5-2699 v4 (i've put number of samples to 1024 to get more reliable timings and master renders in 50.74258 sec, this patch in 46.47736 sec).

As for avxi, stick to Embree as close as possible, at least for now.
In a longer term would be nice to consolidate float4 and ssef, float8 with avxf and so on. And that's where some emulation might become helpful, but should always be very careful with that.

@Sergey Sharybin (sergey) It seems I can measure similar performance gains on my setup as well. I can also measure about 23% performance gain in 4D noise.

To clarify, util_avxi.h is exactly the same as the Embree one, aside from some additional needed functions like cast and ^=. Also, Embree choose a certain header based on support for AVX2:

#if defined (__AVX_I__)
#include "simd/avxi.h"
#else
#include "simd/avxi_emu.h"
#endif

All I did was combine avxi.h and avxi_emu.h in the same file with inline preprocessor checks. Because both files share a lot of the contents and I didn't want to duplicate code.

One question though. In the util files, we have:

#ifndef __UTIL_AVXI_H__
#  define __UTIL_AVXI_H__

CCL_NAMESPACE_BEGIN

...
...
...

#endif

CCL_NAMESPACE_END

Why does CCL_NAMESPACE_END exist outside of the ifndef block?

Max (maxim_d33) added inline comments.Wed, Feb 5, 4:14 PM
intern/cycles/util/util_avxi.h
30

yes

if not KERNEL_AVX2 - it doesnt make sense to have __m256i m256;

intern/cycles/util/util_avxi.h
30

But why? Why would we emulate __m256i instructions using SEE instructions?

Max (maxim_d33) added inline comments.Wed, Feb 5, 4:31 PM
intern/cycles/util/util_avxi.h
30

__m256i is data type mappable to HW, ignore any emulation here.

I still dont understand why union sometimes excludes __m128i
which is always present.

Why does CCL_NAMESPACE_END exist outside of the ifndef block?

Looks like a mistake.

  • Move CCL_NAMESPACE_END inside ifndef block in util_avx{b, f, i}.h.