just activating osl, without using it, makes any render 3 to 4 times slower #46975

New Issue

mathieu menuet · 2015-12-15T10:33:48+01:00

mathieu menuet commented

2015-12-15 10:33:48 +01:00

System Information
Win7 x64

Blender Version
Broken: 2.76b

Short description of error
Take any prodution render with CPU, just activate osl in the render tab, it will render " to 4 time slower, without using it in the scene. Compared to GPU render wich is already 5 times faster than a cpu render, it is = 15 to 20 times slower. OSL is unusable with this bug. Note that other renderer with OSL don't have this speed hit.

Exact steps for others to reproduce the error
With CPU and 16x16 tiles, render this scene: http://download.blender.org/demo/test/pabellon_barcelona_v1.scene_.zip
then just activate osl, render again and compare times. although there isn't any use of OSL in the scene, the render times crawls down.
(you can also lower the samples to 10 to make it faster for both render)

**System Information** Win7 x64 **Blender Version** Broken: 2.76b **Short description of error** Take any prodution render with CPU, just activate osl in the render tab, it will render " to 4 time slower, without using it in the scene. Compared to GPU render wich is already 5 times faster than a cpu render, it is = 15 to 20 times slower. OSL is unusable with this bug. Note that other renderer with OSL don't have this speed hit. **Exact steps for others to reproduce the error** With CPU and 16x16 tiles, render this scene: http://download.blender.org/demo/test/pabellon_barcelona_v1.scene_.zip then just activate osl, render again and compare times. although there isn't any use of OSL in the scene, the render times crawls down. (you can also lower the samples to 10 to make it faster for both render)

mathieu menuet commented

2015-12-15 10:33:48 +01:00

Changed status to: 'Open'

mathieu menuet commented

2015-12-15 10:33:48 +01:00

Added subscriber: @bliblubli

Sergey Sharybin commented

2015-12-15 12:24:07 +01:00

Added subscriber: @Sergey

Sergey Sharybin self-assigned this 2015-12-15 12:24:07 +01:00

Sergey Sharybin commented

2015-12-15 12:24:07 +01:00

OSL is expected to be slower, but 3-4 times seems far too much. Currently looking in another bug, will investigate this one later.

mathieu menuet commented

2015-12-15 16:22:47 +01:00

Note that this bug doesn't trigger on simple scenes like the default cube. But I couldn't find a simple test case for now. Even when using some simple OSL shader instead of the default diffuse on the cube, render times are comparable in my test (maybe 2% slower, but it's in the precision threshold).
The strange part is that the only fact of activating the functionality, without any OSL node in the scene is enough in to make it 3-4 time slower, so it's maybe not even an OSL bug but something else in cycles.

Note that this bug doesn't trigger on simple scenes like the default cube. But I couldn't find a simple test case for now. Even when using some simple OSL shader instead of the default diffuse on the cube, render times are comparable in my test (maybe 2% slower, but it's in the precision threshold). The strange part is that the only fact of activating the functionality, without any OSL node in the scene is enough in to make it 3-4 time slower, so it's maybe not even an OSL bug but something else in cycles.

Sergey Sharybin commented

2015-12-15 16:43:22 +01:00

Once you activated OSL all the nodes are being used form OSL shaders, no matter whether you've got OSL Script node or not, it's simply impossible to mix SVM and OSL shading backends.

There are various known sources of slowdown, which aren't considered a bug for now:

Converting closures from Cycles to OSL
Mipmapping which happens in OSL

One important question tho: is it (such a difference in render times between SVM and OSL on CPU) consistent with all the previous blender versions?

Once you activated OSL all the nodes are being used form OSL shaders, no matter whether you've got OSL Script node or not, it's simply impossible to mix SVM and OSL shading backends. There are various known sources of slowdown, which aren't considered a bug for now: - Converting closures from Cycles to OSL - Mipmapping which happens in OSL One important question tho: is it (such a difference in render times between SVM and OSL on CPU) consistent with all the previous blender versions?

mathieu menuet commented

2015-12-15 17:39:00 +01:00

At least the few times where I tested OSL in the past, I didn't notice those slow downs. But as I said, only certain scene have this huge slow downs. So maybe I just didn't triggered the bug in past versions. I'm not at home atm, but at least, I can say that other renderer have not problem with OSL and complex scenes, so even if this nasty bug is in Cycles since longer, I don't see a reason to keep it? 4 time slowdown just make it unusable even for still rendering.

mathieu menuet commented

2015-12-15 18:39:17 +01:00

After searching on different mailing list and on this bugtracker, I found this: http://lists.blender.org/pipermail/bf-cycles/2013-July/001500.html
So it's a long-standing bug. The profiling over there shows that OSL should only bring 25% slowdown. The Filtering a bit more, but no way something like 400% slowdown. It looks like oiio is responsible for it? Hope it helps.

After searching on different mailing list and on this bugtracker, I found this: http://lists.blender.org/pipermail/bf-cycles/2013-July/001500.html So it's a long-standing bug. The profiling over there shows that OSL should only bring 25% slowdown. The Filtering a bit more, but no way something like 400% slowdown. It looks like oiio is responsible for it? Hope it helps.

Sergey Sharybin commented

2015-12-18 17:57:35 +01:00

This is a bug tracker where we address issues in the code, we don't speculate on whether something has a point to have or not. Just for the record -- OSL/OIIO are doing much more in terms of image filtering comparing to what SVM is doing. Some degree of slowdown is inevitable.

In the particular example with barcelona file:

There's some slowdown caused by closure copy code
Generating autotiles and such on initial texture load. Using .tx files avoids this initial hiccup on textuer load but doesn't make much difference on overall render time.
Huge amount of slowdown caused by OIIO texture sampling. Simply enforcing "builtin" images (which behaves similar to SVM) speeds up render by a factor of 2 (making it only like 20% slower comparing OSL to SVM).

There's still some investigation to do before calling a verdict here.

This is a bug tracker where we address issues in the code, we don't speculate on whether something has a point to have or not. Just for the record -- OSL/OIIO are doing much more in terms of image filtering comparing to what SVM is doing. Some degree of slowdown is inevitable. In the particular example with barcelona file: - There's some slowdown caused by closure copy code - Generating autotiles and such on initial texture load. Using .tx files avoids this initial hiccup on textuer load but doesn't make much difference on overall render time. - Huge amount of slowdown caused by OIIO texture sampling. Simply enforcing "builtin" images (which behaves similar to SVM) speeds up render by a factor of 2 (making it only like 20% slower comparing OSL to SVM). There's still some investigation to do before calling a verdict here.

mathieu menuet commented

2015-12-22 07:37:04 +01:00

Your first results look nice Sergey. Would it be possible to have an option to enforce "builtin" images with OSL?
Solidangle seem to have a cache to solve the problem : https://support.solidangle.com/display/AFMUG/Image+Based+Lighting
No idea if it has a link, but this guy here propose a solution for 3x speedup also: https://github.com/OpenImageIO/oiio/issues/969.
Doing my best to help, it's most certainly behind my coding skills and availaible free time.

Your first results look nice Sergey. Would it be possible to have an option to enforce "builtin" images with OSL? Solidangle seem to have a cache to solve the problem : https://support.solidangle.com/display/AFMUG/Image+Based+Lighting No idea if it has a link, but this guy here propose a solution for 3x speedup also: https://github.com/OpenImageIO/oiio/issues/969. Doing my best to help, it's most certainly behind my coding skills and availaible free time.

mathieu menuet commented

2015-12-31 18:40:25 +01:00

To get some recent values, tested with latest GIT:
With SVM: 29sec

With OSL: 102sec (3,5x slower) Noise is different but not better to my eyes

With OSL and packed textures: 71sec (2.5x slower)

The slowdown with packed images is still very important ( a movie needing 4 weeks to render would need more than 2 months with OSL ), but it's already a very good improvement.

To get some recent values, tested with latest GIT: With SVM: 29sec ![not packed svm 29sec.png](https://archive.blender.org/developer/F271150/not_packed_svm_29sec.png) With OSL: 102sec (3,5x slower) Noise is different but not better to my eyes ![osl 1 42.png](https://archive.blender.org/developer/F271153/osl_1_42.png) With OSL and packed textures: 71sec (2.5x slower) ![packed osl 1min11.png](https://archive.blender.org/developer/F271152/packed_osl_1min11.png) The slowdown with packed images is still very important ( a movie needing 4 weeks to render would need more than 2 months with OSL ), but it's already a very good improvement.

Sergey Sharybin commented

2016-01-04 12:12:08 +01:00

You probably wouldn't see difference between SVM and OSL backends on such image. It's only possible to have some slight lighting differences caused by different alpha transparency and ray termination due to different filtering. You would see difference in scenes like reported in #43495 (that's a bit extreme case, but shows the idea).

Now, from the investigation and benchmarks:

Timing:

SVM: 0:53
OSL: 2:20
Clipping derivatives: 2:15
Packed images: 1:13

Number of texture lookups:

SVM: 101822593
OSL: 109830193

So as you can see using packed images gives about 2x speedup here, which is what i would expect and what i'm not sure about in your timings.

Clipping derivatives might work for the guy you linked here, but it's not nearly as much as claimed 3x speedup. That's probably simply because his derivatives might be wrong scale (be much bigger than ours, where is truth hard to tell atm and that's another story). Such clipping would also affect on the cases like reported in #43495.

Number of texture lookups is quite the same for both shading backends, so the slowdown is unlikely to be caused by different number of bounces (which also easy to confirm by comparing heat-map of ray traversal steps debug pass).

From all this i'm concluding it's not something we're doing totally wrong from our side, just some optimization of a specific lib are needed (like, building them with vectorization support and such). Considering it a TODO for now.

You probably wouldn't see difference between SVM and OSL backends on such image. It's only possible to have some slight lighting differences caused by different alpha transparency and ray termination due to different filtering. You would see difference in scenes like reported in #43495 (that's a bit extreme case, but shows the idea). Now, from the investigation and benchmarks: Timing: - SVM: 0:53 - OSL: 2:20 - Clipping derivatives: 2:15 - Packed images: 1:13 Number of texture lookups: - SVM: 101822593 - OSL: 109830193 So as you can see using packed images gives about 2x speedup here, which is what i would expect and what i'm not sure about in your timings. Clipping derivatives might work for the guy you linked here, but it's not nearly as much as claimed 3x speedup. That's probably simply because his derivatives might be wrong scale (be much bigger than ours, where is truth hard to tell atm and that's another story). Such clipping would also affect on the cases like reported in #43495. Number of texture lookups is quite the same for both shading backends, so the slowdown is unlikely to be caused by different number of bounces (which also easy to confirm by comparing heat-map of ray traversal steps debug pass). From all this i'm concluding it's not something we're doing totally wrong from our side, just some optimization of a specific lib are needed (like, building them with vectorization support and such). Considering it a TODO for now.

mathieu menuet commented

2016-01-04 17:43:55 +01:00

Well I did my test on Windows 7, and the actual builds don't use AVX properly (you can see it by activating the v120_xp option in VS, it will produce a build for XP which will be as fast on XP 64 as the buildbot on windows 7). Only Linux has proper support of AVX, it may also happen in OSL/OIIO code.
But I trust your results and 38% slowdown is really acceptable and comparable to other packages using OSL. Let's hope that Martijn with VS 2015 will resolve this AVX problem and bring performance on the same level than with MinGW and Linux build.

Well I did my test on Windows 7, and the actual builds don't use AVX properly (you can see it by activating the v120_xp option in VS, it will produce a build for XP which will be as fast on XP 64 as the buildbot on windows 7). Only Linux has proper support of AVX, it may also happen in OSL/OIIO code. But I trust your results and 38% slowdown is really acceptable and comparable to other packages using OSL. Let's hope that Martijn with VS 2015 will resolve this AVX problem and bring performance on the same level than with MinGW and Linux build.

Thomas Dinges commented

2016-01-04 20:14:52 +01:00

Added subscriber: @ThomasDinges

Thomas Dinges commented

2016-01-04 20:14:52 +01:00

"Only Linux has proper support of AVX". That's new to me...Source?

Sergey Sharybin commented

2016-01-04 20:22:36 +01:00

We don't compile OIIO/OSL with AVX that's for sure. Also, my desktop on which i was doing benchmarks only supports SSE4.2, so difference in stats is not caused by the AVX instruction set, that's for sure.

Sergey Sharybin removed their assignment 2020-03-13 16:19:50 +01:00

Sergey Sharybin commented

2020-03-13 16:19:50 +01:00

It is unlikely I'll be working on this issue in the nearest future.

There is a development to support mipmap filtering in non-OSL codepath, which then can as well be used for OSL case, avoiding such a performance gap across different shading backends.

It is unlikely I'll be working on this issue in the nearest future. There is a development to support mipmap filtering in non-OSL codepath, which then can as well be used for OSL case, avoiding such a performance gap across different shading backends.

Garry R. Osgood commented

2020-04-27 16:04:06 +02:00

Added subscriber: @grosgood

Brecht Van Lommel added this to the Render & Cycles project 2023-02-07 19:07:34 +01:00

Philipp Oeser removed the

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

just activating osl, without using it, makes any render 3 to 4 times slower #46975