just activating osl, without using it, makes any render 3 to 4 times slower #46975

Open
opened 2015-12-15 10:33:48 +01:00 by mathieu menuet · 19 comments

System Information
Win7 x64

Blender Version
Broken: 2.76b

Short description of error
Take any prodution render with CPU, just activate osl in the render tab, it will render " to 4 time slower, without using it in the scene. Compared to GPU render wich is already 5 times faster than a cpu render, it is = 15 to 20 times slower. OSL is unusable with this bug. Note that other renderer with OSL don't have this speed hit.

Exact steps for others to reproduce the error
With CPU and 16x16 tiles, render this scene: http://download.blender.org/demo/test/pabellon_barcelona_v1.scene_.zip
then just activate osl, render again and compare times. although there isn't any use of OSL in the scene, the render times crawls down.
(you can also lower the samples to 10 to make it faster for both render)

**System Information** Win7 x64 **Blender Version** Broken: 2.76b **Short description of error** Take any prodution render with CPU, just activate osl in the render tab, it will render " to 4 time slower, without using it in the scene. Compared to GPU render wich is already 5 times faster than a cpu render, it is = 15 to 20 times slower. OSL is unusable with this bug. Note that other renderer with OSL don't have this speed hit. **Exact steps for others to reproduce the error** With CPU and 16x16 tiles, render this scene: http://download.blender.org/demo/test/pabellon_barcelona_v1.scene_.zip then just activate osl, render again and compare times. although there isn't any use of OSL in the scene, the render times crawls down. (you can also lower the samples to 10 to make it faster for both render)
Author

Changed status to: 'Open'

Changed status to: 'Open'
Author

Added subscriber: @bliblubli

Added subscriber: @bliblubli

Added subscriber: @Sergey

Added subscriber: @Sergey
Sergey Sharybin self-assigned this 2015-12-15 12:24:07 +01:00

OSL is expected to be slower, but 3-4 times seems far too much. Currently looking in another bug, will investigate this one later.

OSL is expected to be slower, but 3-4 times seems far too much. Currently looking in another bug, will investigate this one later.
Author

Note that this bug doesn't trigger on simple scenes like the default cube. But I couldn't find a simple test case for now. Even when using some simple OSL shader instead of the default diffuse on the cube, render times are comparable in my test (maybe 2% slower, but it's in the precision threshold).
The strange part is that the only fact of activating the functionality, without any OSL node in the scene is enough in to make it 3-4 time slower, so it's maybe not even an OSL bug but something else in cycles.

Note that this bug doesn't trigger on simple scenes like the default cube. But I couldn't find a simple test case for now. Even when using some simple OSL shader instead of the default diffuse on the cube, render times are comparable in my test (maybe 2% slower, but it's in the precision threshold). The strange part is that the only fact of activating the functionality, without any OSL node in the scene is enough in to make it 3-4 time slower, so it's maybe not even an OSL bug but something else in cycles.

Once you activated OSL all the nodes are being used form OSL shaders, no matter whether you've got OSL Script node or not, it's simply impossible to mix SVM and OSL shading backends.

There are various known sources of slowdown, which aren't considered a bug for now:

  • Converting closures from Cycles to OSL
  • Mipmapping which happens in OSL

One important question tho: is it (such a difference in render times between SVM and OSL on CPU) consistent with all the previous blender versions?

Once you activated OSL all the nodes are being used form OSL shaders, no matter whether you've got OSL Script node or not, it's simply impossible to mix SVM and OSL shading backends. There are various known sources of slowdown, which aren't considered a bug for now: - Converting closures from Cycles to OSL - Mipmapping which happens in OSL One important question tho: is it (such a difference in render times between SVM and OSL on CPU) consistent with all the previous blender versions?
Author

At least the few times where I tested OSL in the past, I didn't notice those slow downs. But as I said, only certain scene have this huge slow downs. So maybe I just didn't triggered the bug in past versions. I'm not at home atm, but at least, I can say that other renderer have not problem with OSL and complex scenes, so even if this nasty bug is in Cycles since longer, I don't see a reason to keep it? 4 time slowdown just make it unusable even for still rendering.

At least the few times where I tested OSL in the past, I didn't notice those slow downs. But as I said, only certain scene have this huge slow downs. So maybe I just didn't triggered the bug in past versions. I'm not at home atm, but at least, I can say that other renderer have not problem with OSL and complex scenes, so even if this nasty bug is in Cycles since longer, I don't see a reason to keep it? 4 time slowdown just make it unusable even for still rendering.
Author

After searching on different mailing list and on this bugtracker, I found this: http://lists.blender.org/pipermail/bf-cycles/2013-July/001500.html
So it's a long-standing bug. The profiling over there shows that OSL should only bring 25% slowdown. The Filtering a bit more, but no way something like 400% slowdown. It looks like oiio is responsible for it? Hope it helps.

After searching on different mailing list and on this bugtracker, I found this: http://lists.blender.org/pipermail/bf-cycles/2013-July/001500.html So it's a long-standing bug. The profiling over there shows that OSL should only bring 25% slowdown. The Filtering a bit more, but no way something like 400% slowdown. It looks like oiio is responsible for it? Hope it helps.

This is a bug tracker where we address issues in the code, we don't speculate on whether something has a point to have or not. Just for the record -- OSL/OIIO are doing much more in terms of image filtering comparing to what SVM is doing. Some degree of slowdown is inevitable.

In the particular example with barcelona file:

  • There's some slowdown caused by closure copy code
  • Generating autotiles and such on initial texture load. Using .tx files avoids this initial hiccup on textuer load but doesn't make much difference on overall render time.
  • Huge amount of slowdown caused by OIIO texture sampling. Simply enforcing "builtin" images (which behaves similar to SVM) speeds up render by a factor of 2 (making it only like 20% slower comparing OSL to SVM).

There's still some investigation to do before calling a verdict here.

This is a bug tracker where we address issues in the code, we don't speculate on whether something has a point to have or not. Just for the record -- OSL/OIIO are doing much more in terms of image filtering comparing to what SVM is doing. Some degree of slowdown is inevitable. In the particular example with barcelona file: - There's some slowdown caused by closure copy code - Generating autotiles and such on initial texture load. Using .tx files avoids this initial hiccup on textuer load but doesn't make much difference on overall render time. - Huge amount of slowdown caused by OIIO texture sampling. Simply enforcing "builtin" images (which behaves similar to SVM) speeds up render by a factor of 2 (making it only like 20% slower comparing OSL to SVM). There's still some investigation to do before calling a verdict here.
Author

Your first results look nice Sergey. Would it be possible to have an option to enforce "builtin" images with OSL?
Solidangle seem to have a cache to solve the problem : https://support.solidangle.com/display/AFMUG/Image+Based+Lighting
No idea if it has a link, but this guy here propose a solution for 3x speedup also: https://github.com/OpenImageIO/oiio/issues/969.
Doing my best to help, it's most certainly behind my coding skills and availaible free time.

Your first results look nice Sergey. Would it be possible to have an option to enforce "builtin" images with OSL? Solidangle seem to have a cache to solve the problem : https://support.solidangle.com/display/AFMUG/Image+Based+Lighting No idea if it has a link, but this guy here propose a solution for 3x speedup also: https://github.com/OpenImageIO/oiio/issues/969. Doing my best to help, it's most certainly behind my coding skills and availaible free time.
Author

To get some recent values, tested with latest GIT:
With SVM: 29sec
not packed svm 29sec.png
With OSL: 102sec (3,5x slower) Noise is different but not better to my eyes
osl 1 42.png
With OSL and packed textures: 71sec (2.5x slower)
packed osl 1min11.png
The slowdown with packed images is still very important ( a movie needing 4 weeks to render would need more than 2 months with OSL ), but it's already a very good improvement.

To get some recent values, tested with latest GIT: With SVM: 29sec ![not packed svm 29sec.png](https://archive.blender.org/developer/F271150/not_packed_svm_29sec.png) With OSL: 102sec (3,5x slower) Noise is different but not better to my eyes ![osl 1 42.png](https://archive.blender.org/developer/F271153/osl_1_42.png) With OSL and packed textures: 71sec (2.5x slower) ![packed osl 1min11.png](https://archive.blender.org/developer/F271152/packed_osl_1min11.png) The slowdown with packed images is still very important ( a movie needing 4 weeks to render would need more than 2 months with OSL ), but it's already a very good improvement.

You probably wouldn't see difference between SVM and OSL backends on such image. It's only possible to have some slight lighting differences caused by different alpha transparency and ray termination due to different filtering. You would see difference in scenes like reported in #43495 (that's a bit extreme case, but shows the idea).

Now, from the investigation and benchmarks:

Timing:

  • SVM: 0:53
  • OSL: 2:20
  • Clipping derivatives: 2:15
  • Packed images: 1:13

Number of texture lookups:

  • SVM: 101822593
  • OSL: 109830193

So as you can see using packed images gives about 2x speedup here, which is what i would expect and what i'm not sure about in your timings.

Clipping derivatives might work for the guy you linked here, but it's not nearly as much as claimed 3x speedup. That's probably simply because his derivatives might be wrong scale (be much bigger than ours, where is truth hard to tell atm and that's another story). Such clipping would also affect on the cases like reported in #43495.

Number of texture lookups is quite the same for both shading backends, so the slowdown is unlikely to be caused by different number of bounces (which also easy to confirm by comparing heat-map of ray traversal steps debug pass).

From all this i'm concluding it's not something we're doing totally wrong from our side, just some optimization of a specific lib are needed (like, building them with vectorization support and such). Considering it a TODO for now.

You probably wouldn't see difference between SVM and OSL backends on such image. It's only possible to have some slight lighting differences caused by different alpha transparency and ray termination due to different filtering. You would see difference in scenes like reported in #43495 (that's a bit extreme case, but shows the idea). Now, from the investigation and benchmarks: Timing: - SVM: 0:53 - OSL: 2:20 - Clipping derivatives: 2:15 - Packed images: 1:13 Number of texture lookups: - SVM: 101822593 - OSL: 109830193 So as you can see using packed images gives about 2x speedup here, which is what i would expect and what i'm not sure about in your timings. Clipping derivatives might work for the guy you linked here, but it's not nearly as much as claimed 3x speedup. That's probably simply because his derivatives might be wrong scale (be much bigger than ours, where is truth hard to tell atm and that's another story). Such clipping would also affect on the cases like reported in #43495. Number of texture lookups is quite the same for both shading backends, so the slowdown is unlikely to be caused by different number of bounces (which also easy to confirm by comparing heat-map of ray traversal steps debug pass). From all this i'm concluding it's not something we're doing totally wrong from our side, just some optimization of a specific lib are needed (like, building them with vectorization support and such). Considering it a TODO for now.
Author

Well I did my test on Windows 7, and the actual builds don't use AVX properly (you can see it by activating the v120_xp option in VS, it will produce a build for XP which will be as fast on XP 64 as the buildbot on windows 7). Only Linux has proper support of AVX, it may also happen in OSL/OIIO code.
But I trust your results and 38% slowdown is really acceptable and comparable to other packages using OSL. Let's hope that Martijn with VS 2015 will resolve this AVX problem and bring performance on the same level than with MinGW and Linux build.

Well I did my test on Windows 7, and the actual builds don't use AVX properly (you can see it by activating the v120_xp option in VS, it will produce a build for XP which will be as fast on XP 64 as the buildbot on windows 7). Only Linux has proper support of AVX, it may also happen in OSL/OIIO code. But I trust your results and 38% slowdown is really acceptable and comparable to other packages using OSL. Let's hope that Martijn with VS 2015 will resolve this AVX problem and bring performance on the same level than with MinGW and Linux build.

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

"Only Linux has proper support of AVX". That's new to me...Source?

"Only Linux has proper support of AVX". That's new to me...Source?

We don't compile OIIO/OSL with AVX that's for sure. Also, my desktop on which i was doing benchmarks only supports SSE4.2, so difference in stats is not caused by the AVX instruction set, that's for sure.

We don't compile OIIO/OSL with AVX that's for sure. Also, my desktop on which i was doing benchmarks only supports SSE4.2, so difference in stats is not caused by the AVX instruction set, that's for sure.
Sergey Sharybin removed their assignment 2020-03-13 16:19:50 +01:00

It is unlikely I'll be working on this issue in the nearest future.

There is a development to support mipmap filtering in non-OSL codepath, which then can as well be used for OSL case, avoiding such a performance gap across different shading backends.

It is unlikely I'll be working on this issue in the nearest future. There is a development to support mipmap filtering in non-OSL codepath, which then can as well be used for OSL case, avoiding such a performance gap across different shading backends.

Added subscriber: @grosgood

Added subscriber: @grosgood
Brecht Van Lommel added this to the Render & Cycles project 2023-02-07 19:07:34 +01:00
Philipp Oeser removed the
Interest
Render & Cycles
label 2023-02-09 13:56:49 +01:00

The 4.0 version also has the same issue, about 2-3 times slower. The noise pattern is very different, which is a worse one in my opinion.

The 4.0 version also has the same issue, about 2-3 times slower. The noise pattern is very different, which is a worse one in my opinion.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#46975
No description provided.