Page MenuHome

Minor code cleanups for Visual Studio 2013, which now has a better standard C++ library
Closed, ArchivedPublic

Description

I've put blender, straight out of the git repository, through Visual Studio 2013 and cleaned up some warnings, included annotating the code with the warnings where I did not change the source code.

Blender x64 compiles for me. I have not yet got it to link in Debug mode as I have not yet set up the all the libraries. The Release mode links but the result crashes immediately in wrap_oal.dll, probably because I am running WIndows 8.1 and need a different version of the library.

Perhaps someone like @Martijn Berger (juicyfruit) who has all the libraries set up can give this a whirl, or put it into the experimental buildbot system for checking.

Here's the patchfile:

Details

Type
Patch

Event Timeline

John Pavel (jrp) updated the task description. (Show Details)
John Pavel (jrp) raised the priority of this task from to Needs Triage by Developer.
John Pavel (jrp) claimed this task.
John Pavel (jrp) set Type to Patch.
Thomas Dinges (dingto) triaged this task as Normal priority.Feb 8 2014, 9:37 PM

Ill look at this asap. On first glance it looks good.

Edit:
on second glance.

  • I am not sure I like the warnings in comments about stack size. Although it is useful information I don't know if that is the place for it.
  • Some edits to libraries that are external to blender will most likely be mergable in their current form due to the external nature of those libraries. It might be good to post the edits to upstream.
  • There are some really nice visual studio 2013 / improved c lib specific ifdefs that I think should just go in.

I'll leave it to your discretion (and indeed to apply the patches to the source, as I don't have commit rights).

Various compilation flags ought also to be updated (and should work with some earlier versions of MSVC):

Well I have now managed to get blender to run (it needed a couple of files from the VC2012 OAL library - warp_oal.dll and a dx file).

I'm not sure how to run an appropriate set of regression tests. The built in ones produce:

1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
1>  Test project D:/BlenderGIT/build_cmake
1>        Start  1: script_load_keymap
1>   1/39 Test  #1: script_load_keymap ...................   Passed    1.95 sec
1>        Start  2: script_load_addons
1>   2/39 Test  #2: script_load_addons ...................***Failed    4.47 sec
1>        Start  3: script_load_modules
1>   3/39 Test  #3: script_load_modules ..................***Failed    0.59 sec
1>        Start  4: script_run_operators
1>   4/39 Test  #4: script_run_operators .................***Failed   16.00 sec
1>        Start  5: script_pyapi_mathutils
1>   5/39 Test  #5: script_pyapi_mathutils ...............   Passed    0.62 sec
1>        Start  6: bevel
1>   6/39 Test  #6: bevel ................................   Passed    0.42 sec
1>        Start  7: import_obj_cube
1>   7/39 Test  #7: import_obj_cube ......................***Failed    0.45 sec
1>        Start  8: import_obj_nurbs_cyclic
1>   8/39 Test  #8: import_obj_nurbs_cyclic ..............***Failed    0.42 sec
1>        Start  9: import_obj_makehuman
1>   9/39 Test  #9: import_obj_makehuman .................***Failed    0.42 sec
1>        Start 10: export_obj_cube
1>  10/39 Test #10: export_obj_cube ......................   Passed    0.41 sec
1>        Start 11: export_obj_nurbs
1>  11/39 Test #11: export_obj_nurbs .....................   Passed    0.41 sec
1>        Start 12: export_obj_all_objects
1>  12/39 Test #12: export_obj_all_objects ...............   Passed    0.42 sec
1>        Start 13: import_ply_cube
1>  13/39 Test #13: import_ply_cube ......................***Failed    0.42 sec
1>        Start 14: import_ply_bunny
1>  14/39 Test #14: import_ply_bunny .....................***Failed    0.42 sec
1>        Start 15: import_ply_small_holes
1>  15/39 Test #15: import_ply_small_holes ...............***Failed    0.42 sec
1>        Start 16: export_ply_cube_all_data
1>  16/39 Test #16: export_ply_cube_all_data .............   Passed    0.42 sec
1>        Start 17: export_ply_suzanne_all_data
1>  17/39 Test #17: export_ply_suzanne_all_data ..........   Passed    0.41 sec
1>        Start 18: export_ply_vertices
1>  18/39 Test #18: export_ply_vertices ..................   Passed    0.42 sec
1>        Start 19: import_stl_cube
1>  19/39 Test #19: import_stl_cube ......................***Failed    0.43 sec
1>        Start 20: import_stl_conrod
1>  20/39 Test #20: import_stl_conrod ....................***Failed    0.42 sec
1>        Start 21: import_stl_knot_max_simplified
1>  21/39 Test #21: import_stl_knot_max_simplified .......***Failed    0.43 sec
1>        Start 22: export_stl_cube_all_data
1>  22/39 Test #22: export_stl_cube_all_data .............   Passed    0.41 sec
1>        Start 23: export_stl_suzanne_all_data
1>  23/39 Test #23: export_stl_suzanne_all_data ..........   Passed    0.41 sec
1>        Start 24: export_stl_vertices
1>  24/39 Test #24: export_stl_vertices ..................   Passed    0.42 sec
1>        Start 25: import_x3d_cube
1>  25/39 Test #25: import_x3d_cube ......................***Failed    0.45 sec
1>        Start 26: import_x3d_teapot
1>  26/39 Test #26: import_x3d_teapot ....................***Failed    0.45 sec
1>        Start 27: import_x3d_suzanne_material
1>  27/39 Test #27: import_x3d_suzanne_material ..........***Failed    0.49 sec
1>        Start 28: export_x3d_cube
1>  28/39 Test #28: export_x3d_cube ......................   Passed    0.42 sec
1>        Start 29: export_x3d_nurbs
1>  29/39 Test #29: export_x3d_nurbs .....................   Passed    0.41 sec
1>        Start 30: export_x3d_all_objects
1>  30/39 Test #30: export_x3d_all_objects ...............   Passed    0.42 sec
1>        Start 31: import_3ds_cube
1>  31/39 Test #31: import_3ds_cube ......................***Failed    0.43 sec
1>        Start 32: import_3ds_hierarchy_lara
1>  32/39 Test #32: import_3ds_hierarchy_lara ............***Failed    0.42 sec
1>        Start 33: import_3ds_hierarchy_greek_trireme
1>  33/39 Test #33: import_3ds_hierarchy_greek_trireme ...***Failed    0.42 sec
1>        Start 34: export_3ds_cube
1>  34/39 Test #34: export_3ds_cube ......................   Passed    0.41 sec
1>        Start 35: export_3ds_nurbs
1>  35/39 Test #35: export_3ds_nurbs .....................   Passed    0.42 sec
1>        Start 36: export_3ds_all_objects
1>  36/39 Test #36: export_3ds_all_objects ...............   Passed    0.41 sec
1>        Start 37: export_fbx_cube
1>  37/39 Test #37: export_fbx_cube ......................   Passed    0.41 sec
1>        Start 38: export_fbx_nurbs
1>  38/39 Test #38: export_fbx_nurbs .....................   Passed    0.42 sec
1>        Start 39: export_fbx_all_objects
1>  39/39 Test #39: export_fbx_all_objects ...............   Passed    0.41 sec
1>  
1>  54% tests passed, 18 tests failed out of 39
1>  
1>  Total Test time (real) =  38.29 sec
1>  
1>  The following tests FAILED:
1>  	  2 - script_load_addons (Failed)
1>  	  3 - script_load_modules (Failed)
1>  	  4 - script_run_operators (Failed)
1>  	  7 - import_obj_cube (Failed)
1>  	  8 - import_obj_nurbs_cyclic (Failed)
1>  	  9 - import_obj_makehuman (Failed)
1>  	 13 - import_ply_cube (Failed)
1>  	 14 - import_ply_bunny (Failed)
1>  	 15 - import_ply_small_holes (Failed)
1>  	 19 - import_stl_cube (Failed)
1>  	 20 - import_stl_conrod (Failed)
1>  	 21 - import_stl_knot_max_simplified (Failed)
1>  	 25 - import_x3d_cube (Failed)
1>  	 26 - import_x3d_teapot (Failed)
1>  	 27 - import_x3d_suzanne_material (Failed)
1>  	 31 - import_3ds_cube (Failed)
1>  	 32 - import_3ds_hierarchy_lara (Failed)
1>  	 33 - import_3ds_hierarchy_greek_trireme (Failed)
1>  Errors while running CTest

Some further findings:

  • It seems harder to set up profile-guided optimization because some projects require link-time code generation, but some projects such as datoc_icon (it is not clear where they are set in the various CMake files) need to compile fully and run to produce other bits of source code
  • (re)building the libraries, even with the provided built.bat scripts, seems a bit tricky. It is not always clear whether static or dynamic libraries are required and the scripts don't always seem to produce the right flavour. Eg, Zlib and png (which have both been updated upstream) seem to be built in the wrong flavours (or at least not with the expected names and even when they are renamed, linking fails because some symbols are not found); the debug variants some of the libraries are not built;
  • libraries that have been updated upstream include freetype, png, zlib, openjpeg (fails a couple of tests in its own test suite with VC2012 and 2013)); ffmpeg has been updated and requires the new names to be set in the CMakefile; an updated llvm 3.4 seems to compile up OK but I have not figured out how to get it included successfully probably because not all the patches provided can be applied to the new release; Eigen34 slots in OK after some CMakefile tweaking.
John Pavel (jrp) added a comment.EditedFeb 22 2014, 2:13 PM

Recent versions of MSVC have unordered map. (Earlier versions also have a version.)

diff --git a/intern/cycles/util/util_map.h b/intern/cycles/util/util_map.h
  index 77500e4..c45b7e2 100644
  --- a/intern/cycles/util/util_map.h
  +++ b/intern/cycles/util/util_map.h
  @@ -18,13 +18,21 @@
   #define __UTIL_MAP_H__
  
   #include <map>
  +#if defined(_MSC_VER) && _MSC_VER >= 1700
  +#include <unordered_map>
  +#else
   #include <boost/tr1/unordered_map.hpp>
  +#endif
  
   CCL_NAMESPACE_BEGIN
  
   using std::map;
   using std::pair;
  +#if defined(_MSC_VER) && _MSC_VER >= 1700
  +using std::unordered_map;
  +#else
   using std::tr1::unordered_map;
  +#endif
  
   CCL_NAMESPACE_END

diff --git a/intern/cycles/util/util_set.h b/intern/cycles/util/util_set.h
index 6078114..e171410 100644

  • a/intern/cycles/util/util_set.h

+++ b/intern/cycles/util/util_set.h
@@ -18,12 +18,19 @@
#define UTIL_SET_H

#include <set>
+#if defined(_MSC_VER) && (_MSC_VER < 1800)
#include <boost/tr1/unordered_set.hpp>

+#else
+#include <unordered_set>
+#endif
CCL_NAMESPACE_BEGIN

using std::set;
+#if defined(_MSC_VER) && (_MSC_VER < 1800)
using std::tr1::unordered_set;
+#else
+using std::unordered_set;
+#endif

CCL_NAMESPACE_END

Also, a tiny optimization at line 68 of BLI_utildefines.h, the MIN3 macro can be redefined as

#define MIN3(x, y, z)		((x) < (y) ? MIN2((x), (z)) : MIN2((y), (z)))

[QUOTE: Original by jrp]
//Various compilation flags ought also to be updated (and should work with some earlier versions of MSVC):

1 The /Ox compiler option produces code that favors execution speed over smaller size. http://msdn.microsoft.com/en-us/library/59a3b321.aspx
2 /favor:INTEL64 (default) or /favor:AMD64 (configurable), and /arch:AVX (configurable; I think that this may already be in, but I don't see it in the main CMAKE settings)
3 The current setup in Debug mode, at least, generates a lot of distracting warning D9030: '/Gm' is incompatible with multiprocessing; ignoring /MP switch" messages
4 For the Release, we ought to set the /Gw (whole program optimization) switch http://blogs.msdn.com/b/vcblog/archive/2013/09/11/introducing-gw-compiler-switch.aspx
5 /GA (http://msdn.microsoft.com/en-us/library/yetwazx6.aspx)

//
[END QUOTE]

I have some info on some of these ;)

  1. Should work...
  2. /favor:blend is the default in MSVC (MSDN Article) We should leave it that way ;)
  3. Should be ok
  4. /Gw doesn't work well because it takes huge amounts of memory to do this. I have 16GB of ram on my dev machine and that's not enough.
  5. /GA is a good idea, we should test this.

I see that someone has enabled /f arch avx, etc, for the cycles files. They should also do so for the Eigen files.

The whole program optimization is tricky because it relies on link time code generation but some of the projects fail if they don't compile and run because they generate program segments that are used in other projects. Is there an option to link at function level that disregards all the many unused functions and so reducing memory requirements?

We cannot use the AVX switch in general, the binary would not run on non SSE/AVX systems then.

The other flag that it would be good to enable, particularly for internal parts of the code using SIMD is /Gv (http://blogs.msdn.com/b/vcblog/archive/2013/07/12/introducing-vector-calling-convention.aspx) for x64 builds at least. The trick will be figuring out where not to use it because it won't work with pre-built libraries.

Hmm...

Good News everyone ;)
Microsoft introduces the AVX2 extensions in VS2013 Update 2...
Just in case we have use for it in cycles...

The /Gv Switch might be useful in the SSE/AVX parts of cycles but besides that we won't need it.
There are other things that bug me more than These optimization things.

For example:

  1. There is a min macro definitions in blenlib/intern/sort.c that is not needed in MSVC2012/13 because it is already defined in stdlib.h.
  2. There is a va_copy macro definition in imbuf/intern/util.c that is not needed in MSVC2012/13 because it is already defined in stdarg.h
  3. We still use GetVersionEX calls to determine the Windows Version, These API calls are marked deprecated.
  4. There are many macro redefinition warnings when building blender in MSVC2013 that could possibly be avoided.
  5. We defniately have many things within Blenders codebase that use a #ifdef _MSVC_VER block that does things that were needed to build with MSVC2008 but are unnecessary in MSVC2013

I could continue counting things, but that would waste our time. We should get to cleaning up that code a bit and use language Standards where possible instead of relying on old vc9 workarounds.

We have 277 cases of #ifdef _MSC_VER in 136 files, i'll get to evaluating them if they are really needed for building with MSVC 2013.

;)

Good news in regard to AVX2. :)

I agree with those issues, we should clean that up. Ideally (yes I am a fan of such radical solutions :D ) we drop MSVC 2008 support completely. This way we don't end up with hundreds of #ifdefs all over the place.

Before we can drop vc2008 though, all core devs (Brecht, Sergey, Campbell..) who sometimes debug on Windows should check though if vc2013 (Express) is fine for them.

I forgot the link ;)
You'll find the AVX2 announcement under "Code Generation"
Microsoft KB for Update 2 CTP 2

I agree, dropping this old stuff would be a great relief for all of us.
I had a good time looking at some of these #ifdefs ... there is even one VC6 Workaround still left from ancient times *g*

VS Express is quite feature complete. I think they'll be ok with it...
Except the missing CUDA plugins from NVidia :(
I got myself the professional version, because I needed the MFC for porting a very old program to Win7/8 so I know both. The profiler in vs 2013 pro is quite ok, but there are free tools out there that do the Job too.

A few random observations:

Yes, the CUDA absence is a pity.

AVX2 support will be no bad thing, if the vectorizer works well; however, switching on SIMD instructions still seems to slow Cycles down massively. (The code is particularly confusing with similar but different #ifdefs for KERNEL_SSEx and BUILD_CYCLES_WITH_SSEx_OPTIMIZATION ...). More fundamentally, the build systems don't allow you to switch on even AVX cleanly, unfortunately. (I've hacked the main CMakefile and switched on vectorization, /Ox.)

Because Blender is built using a lot of different code bases and external libraries, it has multiple definitions of similar but different basic building blocks like vectors, noise functions. Refactoring these would be a big task, presumably with little immediate benefit other than easier maintainability. Howver, VS2013 standard library now has support for threading and includes set maps. So the boost versions are no longer needed.

You'll never quite get rid of all the old _MSVC workarounds, because some of the external libraries have them for VS2010 and VS2012 for example. I've just put in Eigen 3.2.1. Drops in fine once the additional files are added to the CMakefile. (Again, you need to add definition flags to get it to enable vectorization / SIMD because the Core file can't detect support automatically with (any) MSVC).

There are various redefinitions such as
#define _WIN32_WINNT 0x501 // require Windows XP or newer
which should also be reviewed.

FInally, here's a small patch that removes a warning about mixing 32-bit and 64-bit items when shifting.

Hmmm. Found that MS recommend /O2 over /Ox (enable buffer checks and function level linking, otherwise the same).

VC++ 2013 now has a std:: implementation of threads and mutexes.

This inserts quite easily. The following patch

should, if enabled, work with other compilers that have a modern std::. It avoids the need to import the ancient posix pThreads. A proper change would include extra files those that test for std::unorderedset/map.

That threads patch could be made more generic with respect to c++11 style threads vs boost threads.
There are better ways to detect c++11 threading support other then "#if defined( _MSC_VER ) && (_MSC_VER >= 1800)"
It might be better to use one define for setting threading behaviour and that to enabled if we have a modern standards compliant msvc version like 2013/18.0

It is really good work and these kinds of things Is why I wanted to work on msvc 2013 support in the first place.

Thanks. You are, of course, right. I have just tried to make the minimum change to the code to test the concept. It demonstrates that the new std:: library approach works and, as such, i "#if defined( _MSC_VER ) && (_MSC_VER >= 1800)" should be replaced by some more generic flag that switches on C++11 library support as an alternative to Boost. The C++11 approach might allow us to write more C++ like code (getting rid of pointers to threads, etc).

A slightly more invasive set of changes was required to replace boosts function binding with std:: function.

This patch has all the differences between my working copy and the git repository. (I see that I put in an updated glew, which makes the patch humongous, I'm afraid.)

In cycles, as well as

  • replacing boost function and thread and posix time with the std:: versions, if CCL_USE_CPP11 is set,
  • there are quite a few simd optimizations, including float3 and 4 versions of what I see @Sv. Lockal (lockal) has added (although the KERNEL_SSE enabled run time is still more than double the scalar version, because of all the shuffle_swapping in the bvh_intersect routines, perhaps). Setting flush to zeros and denormals to zero seems to make little difference, from a quick play. Settling for less accurate reciprocals and reciprocal square root may help a bit, but I have not tried.

I'll have a look at the task scheduler next (for refactoring into std:: idiom versions).

@John Pavel (jrp)
I have looked over your patch but I have some comments.
Please do not mix updates of things like glew with compiler settings and fixes.
We cannot set the default compile flags to include ssse3, AVX and flags like that as there are still people who want to use blender in a context where they do not have access to cpu that has this. Similarly we cannot use these for math libraries unless there is a runtime detection mechanism with multiple code-path’s.

About the cycles SIMD fixes / (hackes for msvc's benefit) I do not understand well enough why and how they work so I am happy if @Sv. Lockal (lockal) can take a look at those.

Lets get some of this in a merge-able state so it can go in. I am happy about the CCL_USE_CPP11 part for example only it is so hard to factor that out of the huge patch.

  1. The slowdown from enabling KERNEL_SSE occurs mostly because compilers are unable to reassociate and simplify math expressions, written with intrinsics (except for icc). However it does not prevents us from adding sse_float4 or sse_float3 classes with overloaded operators. In this way we could control places, where compiler uses intrinsics, and where it tries to optimize automatically with autovectorization.
  1. We can't simply use pass-by-reference idiom in kernel code, because it breaks OpenCL. However if there are any performance changes with references, we could use someting like: #ifdef KERNEL_OPENCL typedef float4 float4arg; #else typedef const float4& float4arg; #endif

... and pass via float4arg.

Settling for less accurate reciprocals and reciprocal square root may help a bit, but I have not tried.

In most cases it leads to visible image artifacts.

If you find any compiler flags that make cycles faster without introducing artifacts, I would recommend you to split them in a separate patch. I don't think that whole-program optimization would help though. It does not help with gcc. Cycles is already one huge whole program module, where all hot functions are combined together. GCC with flto just distributes these hot functions over full blender binary which leads to ~15% slowdown.

Thanks for the pointers. I've now tried with OpenCL and see what you mean. I'll clean things up so that it works with OpenCL again.

@Sv. Lockal (lockal), just so that I get this right, can you please tell me what the different KERNEL defines are used to distinguish (KERNEL_SSE, KERNEL_SSE2, KERNEL_OPENCL_, KERNEL_GPU)? Thanks.

Cycles provides multiple entry points (a.k.a. kernel functions in OpenCL or global functions in CUDA) for different CPUs and platforms. These entry points are located in files:

  1. kernel.cl (+ kernel_compat_opencl.h)
  2. kernel.cu (+ kernel_compat_cuda.h)
  3. kernel_avx.cpp (+ kernel_compat_cpu.h)
  4. kernel_sse2.cpp (+ kernel_compat_cpu.h)
  5. kernel_sse3.cpp (+ kernel_compat_cpu.h)
  6. kernel_sse41.cpp (+ kernel_compat_cpu.h)

__KERNEL_SSE2__, __KERNEL_SSE3__, __KERNEL_SSSE3__ and __KERNEL_SSE41__ are defined in files 3-6 (and also intern/cycles/util/util_optimization.h) and can be used to determine, which version of SSE is available for specific entry point.

__KERNEL_SSE__ could be used to enable SSE intrinsics for int4/float4/float3 classes, but disabled for now, because it leads to slowdown for many reasons.

__KERNEL_GPU__ is defined in files 1 and 2.

__KERNEL_CUDA__ and __KERNEL_OPENCL__ are defined in files 1 and 2 accordingly.

Thanks. That's v helpful. I am very tempted to change KERNEL_SSE to something like CCL_USE_SIMD to make it clearer that is a different type of animal from the other __KERNEL_SSEs.

I hadn't taken the need for C compatibility for OPENCL into account properly. This cycles patch below, starts to use C++1 STL and also includes a simplex noise generator that seems to work OK, as well as including some SIMD tweaks.

I think that theKERNEL_SSE version (which is very slow) seems to be twice as fast at building the bvh, but it really stalls when trying to find the various intersections, for reasons that I don't understand.

The code in util_math is a bit messy. It might be preferable to keep two versions (OPENCL/native and the SIMD version) in separate files. The SIMD version can then be quite compact, looking at the embree version of the types ssef, etc. (There is no need to use the pass by reference idiom because most of the functions are inlined, but is looks more C++-like.)

I'll have a further look to see if there are other optimizations possible.

... missed a couple of uses of thread / mutex in the device_network

Not sure what has changed, recently, but I can now compile cycles with whole program optimization (/GL) and it gives a just noticeable speedup (2:14 v 2:23 on the BMW test).

FFTW 3.3.4 has been released. This set of project files allows you to build it in VS2013, providing AVX support if your cpu does.

A patch to pick up standard versions of isnan, isfinite, isinfinite

And another quickie to set fast transcendentals

It is no longer necessary, if it ever was, to include explicit function binds when calling thread.

storage.c has a call to _wstat that translates to a call to _wstat64i32 which is not right, as that expects a _stat64*, not stat* arg. The following removes the warning by calling stat

#if defined(WIN32) && (_MSC_VER < 1800)
int BLI_stat(const char *path, struct stat *buffer)
{
	int r;
	UTF16_ENCODE(path);

	/* workaround error in MinGW64 headers, normally, a _wstat should work */
#ifndef __MINGW64__ 
	r = _wstat(path_16, buffer); /* Warning: incompatible types - from 'stat *' to '_stat64 *' */
#else
	r = _wstati64(path_16, buffer);
#endif

	UTF16_UN_ENCODE(path);
	return r;
}
#else
int BLI_stat(const char *path, struct stat *buffer)
{
	return stat(path, buffer);
}
#endif

Visual Studio 20013 Update 2 is here. Supports AVX2, so time for a new Cycles kernel??

I already added an AVX2 kernel in my branch (soc-2014-cycles).

What is the status of this ticket ?

@John Pavel (jrp) there are so many little patches and tweaks in here but this whole ticket is not in a usable state for me.. should we close this ?

Go ahead and close it. I was using it to keep track of things that I had found.

I would actually like to fix some of the stuff looking through it and I think blender would be better for it but kinda was hoping you would filter / bundle / update it into a patch for me :)

Thx, for the effort but this ticket currently does not contain anything I can directly use. closing for now