just activating osl, without using it, makes any render 3 to 4 times slower
Open, NormalPublic

Description

System Information
Win7 x64

Blender Version
Broken: 2.76b

Short description of error
Take any prodution render with CPU, just activate osl in the render tab, it will render " to 4 time slower, without using it in the scene. Compared to GPU render wich is already 5 times faster than a cpu render, it is = 15 to 20 times slower. OSL is unusable with this bug. Note that other renderer with OSL don't have this speed hit.

Exact steps for others to reproduce the error
With CPU and 16x16 tiles, render this scene: http://download.blender.org/demo/test/pabellon_barcelona_v1.scene_.zip
then just activate osl, render again and compare times. although there isn't any use of OSL in the scene, the render times crawls down.
(you can also lower the samples to 10 to make it faster for both render)

Details

Type
To Do

OSL is expected to be slower, but 3-4 times seems far too much. Currently looking in another bug, will investigate this one later.

Sergey Sharybin (sergey) triaged this task as "Normal" priority.Dec 15 2015, 12:24 PM

Note that this bug doesn't trigger on simple scenes like the default cube. But I couldn't find a simple test case for now. Even when using some simple OSL shader instead of the default diffuse on the cube, render times are comparable in my test (maybe 2% slower, but it's in the precision threshold).
The strange part is that the only fact of activating the functionality, without any OSL node in the scene is enough in to make it 3-4 time slower, so it's maybe not even an OSL bug but something else in cycles.

Once you activated OSL all the nodes are being used form OSL shaders, no matter whether you've got OSL Script node or not, it's simply impossible to mix SVM and OSL shading backends.

There are various known sources of slowdown, which aren't considered a bug for now:

  • Converting closures from Cycles to OSL
  • Mipmapping which happens in OSL

One important question tho: is it (such a difference in render times between SVM and OSL on CPU) consistent with all the previous blender versions?

At least the few times where I tested OSL in the past, I didn't notice those slow downs. But as I said, only certain scene have this huge slow downs. So maybe I just didn't triggered the bug in past versions. I'm not at home atm, but at least, I can say that other renderer have not problem with OSL and complex scenes, so even if this nasty bug is in Cycles since longer, I don't see a reason to keep it? 4 time slowdown just make it unusable even for still rendering.

After searching on different mailing list and on this bugtracker, I found this: http://lists.blender.org/pipermail/bf-cycles/2013-July/001500.html
So it's a long-standing bug. The profiling over there shows that OSL should only bring 25% slowdown. The Filtering a bit more, but no way something like 400% slowdown. It looks like oiio is responsible for it? Hope it helps.

This is a bug tracker where we address issues in the code, we don't speculate on whether something has a point to have or not. Just for the record -- OSL/OIIO are doing much more in terms of image filtering comparing to what SVM is doing. Some degree of slowdown is inevitable.

In the particular example with barcelona file:

  • There's some slowdown caused by closure copy code
  • Generating autotiles and such on initial texture load. Using .tx files avoids this initial hiccup on textuer load but doesn't make much difference on overall render time.
  • Huge amount of slowdown caused by OIIO texture sampling. Simply enforcing "builtin" images (which behaves similar to SVM) speeds up render by a factor of 2 (making it only like 20% slower comparing OSL to SVM).

There's still some investigation to do before calling a verdict here.

Your first results look nice Sergey. Would it be possible to have an option to enforce "builtin" images with OSL?
Solidangle seem to have a cache to solve the problem : https://support.solidangle.com/display/AFMUG/Image+Based+Lighting
No idea if it has a link, but this guy here propose a solution for 3x speedup also: https://github.com/OpenImageIO/oiio/issues/969.
Doing my best to help, it's most certainly behind my coding skills and availaible free time.

To get some recent values, tested with latest GIT:
With SVM: 29sec


With OSL: 102sec (3,5x slower) Noise is different but not better to my eyes

With OSL and packed textures: 71sec (2.5x slower)

The slowdown with packed images is still very important ( a movie needing 4 weeks to render would need more than 2 months with OSL ), but it's already a very good improvement.

You probably wouldn't see difference between SVM and OSL backends on such image. It's only possible to have some slight lighting differences caused by different alpha transparency and ray termination due to different filtering. You would see difference in scenes like reported in T43495 (that's a bit extreme case, but shows the idea).

Now, from the investigation and benchmarks:

Timing:

  • SVM: 0:53
  • OSL: 2:20
  • Clipping derivatives: 2:15
  • Packed images: 1:13

Number of texture lookups:

  • SVM: 101822593
  • OSL: 109830193

So as you can see using packed images gives about 2x speedup here, which is what i would expect and what i'm not sure about in your timings.

Clipping derivatives might work for the guy you linked here, but it's not nearly as much as claimed 3x speedup. That's probably simply because his derivatives might be wrong scale (be much bigger than ours, where is truth hard to tell atm and that's another story). Such clipping would also affect on the cases like reported in T43495.

Number of texture lookups is quite the same for both shading backends, so the slowdown is unlikely to be caused by different number of bounces (which also easy to confirm by comparing heat-map of ray traversal steps debug pass).

From all this i'm concluding it's not something we're doing totally wrong from our side, just some optimization of a specific lib are needed (like, building them with vectorization support and such). Considering it a TODO for now.

Sergey Sharybin (sergey) changed Type from Bug to To Do.Jan 4 2016, 12:12 PM

Well I did my test on Windows 7, and the actual builds don't use AVX properly (you can see it by activating the v120_xp option in VS, it will produce a build for XP which will be as fast on XP 64 as the buildbot on windows 7). Only Linux has proper support of AVX, it may also happen in OSL/OIIO code.
But I trust your results and 38% slowdown is really acceptable and comparable to other packages using OSL. Let's hope that Martijn with VS 2015 will resolve this AVX problem and bring performance on the same level than with MinGW and Linux build.

"Only Linux has proper support of AVX". That's new to me...Source?

We don't compile OIIO/OSL with AVX that's for sure. Also, my desktop on which i was doing benchmarks only supports SSE4.2, so difference in stats is not caused by the AVX instruction set, that's for sure.