Cycles GPU Performance Regression on Titan #38679

Closed
opened 2014-02-17 14:26:26 +01:00 by Gabriel Caraballo · 46 comments

System Information
Ubuntu 12.04, kernel 3.2.0-58-generic
Intel® Xeon(R) CPU E5620 @ 2.40GHz × 16
7,7 GiB RAM
GeForce GT 430
NVIDIA Corporation GK110 [GeForce GTX Titan] (rev a1)

NVIDIA driver version: 331.20
X.Org: 1.11.3

Blender Version
Broken: fdcdd5e
Worked: 2.69

Short description of error
Rendering a project with GeForce GTX Titan (only) on Blender fdcdd5e takes double time than on Blender 2.69.

Exact steps for others to reproduce the error

  • Open Blender 2.69

  • Change Render Engine to Cycles

  • Set Render Samples to 100

  • Render time: 2 seconds

  • Open Blender fdcdd5e

  • Change Render Engine to Cycles

  • Set Render Samples to 100

  • Render time: 4 seconds

**System Information** Ubuntu 12.04, kernel 3.2.0-58-generic Intel® Xeon(R) CPU E5620 @ 2.40GHz × 16 7,7 GiB RAM GeForce GT 430 NVIDIA Corporation GK110 [GeForce GTX Titan] (rev a1) NVIDIA driver version: 331.20 X.Org: 1.11.3 **Blender Version** Broken: fdcdd5e Worked: 2.69 **Short description of error** Rendering a project with GeForce GTX Titan (only) on Blender fdcdd5e takes double time than on Blender 2.69. **Exact steps for others to reproduce the error** * Open Blender 2.69 * Change Render Engine to Cycles * Set Render Samples to 100 * Render time: 2 seconds * Open Blender fdcdd5e * Change Render Engine to Cycles * Set Render Samples to 100 * Render time: 4 seconds

Changed status to: 'Open'

Changed status to: 'Open'

Added subscriber: @GabrielCaraballo

Added subscriber: @GabrielCaraballo

On GPU.

The name of the bug must be: Cycles GPU Performance Regression (I can't change it now)
The regression is only rendering on GPU.

On GPU. The name of the bug must be: Cycles GPU Performance Regression (I can't change it now) The regression is only rendering on GPU.
Brecht Van Lommel changed title from Cycles Performance Regression to Cycles GPU Performance Regression 2014-02-17 14:34:46 +01:00

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger

Added subscriber: @brecht

Added subscriber: @brecht

There was a change to avoid CPU load at 100% during GPU render, which may have a big impact on very simple renders like the default cube. However for more complex scenes the impact should be be a lot less.

Can you test what kind of impact there is on more complex scenes?

There was a change to avoid CPU load at 100% during GPU render, which may have a big impact on very simple renders like the default cube. However for more complex scenes the impact should be be a lot less. Can you test what kind of impact there is on more complex scenes?

Sure Brecht, a Kiribati frame takes 5 minutes with fdcdd5e, and 2:30 minutes on 2.69

Sure Brecht, a Kiribati frame takes **5** minutes with fdcdd5e, and **2:30** minutes on **2.69**

Thanks, that definitely needs to be fixed then.

Thanks, that definitely needs to be fixed then.

Added subscriber: @Luarvik

Added subscriber: @Luarvik

@GabrielCaraballo
How about more complex scene with longer rendering time? Is it still double rate time difference? I have GTX 560Ti 2Gb. Comparing Blender 2.69.0 and today's buildbot version I got "only" 10% regression.

@GabrielCaraballo How about more complex scene with longer rendering time? Is it still double rate time difference? I have GTX 560Ti 2Gb. Comparing Blender 2.69.0 and today's buildbot version I got "only" 10% regression.
Member

For me the regression depends very much on the OS and or drivers and the average time a tile takes.

The time we lose should be per tile. We could en-queue the next tile for single GPU case to take away most of the pain.
But proper a-syncify-ing device_cuda takes a non trivial effort and it on my todo-list.

@brecht Ill be on irc tonight if you want to talk about this. but we can always revert to busy waiting or make it user definable

For me the regression depends very much on the OS and or drivers and the average time a tile takes. The time we lose should be per tile. We could en-queue the next tile for single GPU case to take away most of the pain. But proper a-syncify-ing device_cuda takes a non trivial effort and it on my todo-list. @brecht Ill be on irc tonight if you want to talk about this. but we can always revert to busy waiting or make it user definable

If we can't find a fix quickly we should indeed revert this change and move it to the next release.

If we can't find a fix quickly we should indeed revert this change and move it to the next release.
Member

This is a possible fix

This is a possible fix

This patch seems to work well for me on OS X. I tried rendering the default cube with 100 samples and the default (small) tile size, and it does make a big performance difference there. For more complex scenes I didn't notice that much difference but it worked fine.

Viewport rendering seems to be stable here as well, so it might be good to commit this and get feedback from more people.

This patch seems to work well for me on OS X. I tried rendering the default cube with 100 samples and the default (small) tile size, and it does make a big performance difference there. For more complex scenes I didn't notice that much difference but it worked fine. Viewport rendering seems to be stable here as well, so it might be good to commit this and get feedback from more people.
Member

This fix adds some extra pandemic sync() for the case where we cancel the task.

@GabrielCaraballo do you have an option to compile with a patch or should I make a build for you to test with ?

This fix adds some extra pandemic sync() for the case where we cancel the task. @GabrielCaraballo do you have an option to compile with a patch or should I make a build for you to test with ?

If you can make a build will be great @juicyfruit.

If you can make a build will be great @juicyfruit.
Member

@GabrielCaraballo after consulting with @brecht I committed a posible fix f1aeb2ccf4

tomorrows http://builder.blender.org/ builds should be good for testing this.

@GabrielCaraballo after consulting with @brecht I committed a posible fix f1aeb2ccf4 tomorrows http://builder.blender.org/ builds should be good for testing this.

The latest 64 bit linux build here now has the fix:
http://builder.blender.org/download/

The latest 64 bit linux build here now has the fix: http://builder.blender.org/download/

Thanks @Brecht. New Render Times:

Blender 2k half 2k
f1aeb2c 2:31 9:25
2.69 1.29 5:35

Tile Size: 256x215

Two Render Layers, Foreground and Background. Some compositing for color correction. (Compositing time is the same, don't affect)

Thanks @Brecht. New Render Times: | Blender | 2k half | 2k | | -- | -- | -- | | f1aeb2c| 2:31 | 9:25 | | **2.69** | 1.29 | 5:35 | Tile Size: 256x215 Two Render Layers, Foreground and Background. Some compositing for color correction. (Compositing time is the same, don't affect)
Member

wow . that is bad...

I have a problem that I cannot reproduce a case where I get such bad number.

wow . that is bad... I have a problem that I cannot reproduce a case where I get such bad number.
Member

The closest I have to your hardware is a GTX 780 (the plain one) and it is windows

But there in BMW M1 and Barcelona I have almost no regression even without the proposed fix.
I do use larger tiles ~90k pixels per tile tiles are all same size. (h/4 , w/4)

The closest I have to your hardware is a GTX 780 (the plain one) and it is windows But there in BMW M1 and Barcelona I have almost no regression even without the proposed fix. I do use larger tiles ~90k pixels per tile tiles are all same size. (h/4 , w/4)

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

That is weird, both the Titan and the GTX 780 are sm_35 and should therefore use the same code...

Windows/Linux issue? We should be able to find other Titan users, who can confirm/disconfirm the issue.

That is weird, both the Titan and the GTX 780 are sm_35 and should therefore use the same code... Windows/Linux issue? We should be able to find other Titan users, who can confirm/disconfirm the issue.
Member

After the fix it should be at most a few ms per tile. say there are 50 tiles per frame and 2 layers. then 1 sec is the absolute max regression there should be.

I would want to work towards a scene / case where I can reproduce and fix the problem. If sharing the problem scene is not possible maybe @GabrielCaraballo can also try BMW or Barcelona so we can bisect the issue some more.

After the fix it should be at most a few ms per tile. say there are 50 tiles per frame and 2 layers. then 1 sec is the absolute max regression there should be. I would want to work towards a scene / case where I can reproduce and fix the problem. If sharing the problem scene is not possible maybe @GabrielCaraballo can also try BMW or Barcelona so we can bisect the issue some more.

Added subscriber: @mib2berlin

Added subscriber: @mib2berlin

Hi, I also cant reproduce the regression with BMW and Barcelona files.

Opensuse 13.1/64
Intel i5 3770K
GTX 760
GTX 560Ti 448 Cores
Driver 331.20

BMW, set tiles to 256x256:

http://www.blenderartists.org/forum/showthread.php?239480-2-6x-Cycles-render-benchmark

Barcelona is not available from orginal BA thread, uploaded on pastall.
Please switch scene to Benchmark1 GPU.

http://www.pasteall.org/blend/26908

@GabrielCaraballo, Hi, I think it is not possible for you to upload sample file, maybe a simplified one?

Cheers, mib.

Hi, I also cant reproduce the regression with BMW and Barcelona files. Opensuse 13.1/64 Intel i5 3770K GTX 760 GTX 560Ti 448 Cores Driver 331.20 BMW, set tiles to 256x256: http://www.blenderartists.org/forum/showthread.php?239480-2-6x-Cycles-render-benchmark Barcelona is not available from orginal BA thread, uploaded on pastall. Please switch scene to Benchmark1 GPU. http://www.pasteall.org/blend/26908 @GabrielCaraballo, Hi, I think it is not possible for you to upload sample file, maybe a simplified one? Cheers, mib.
Member

System windows 7 x64.
GTX 780 ( nvidia latest forgot to check exact number)

Default cube 1920x1080 * 1/4
500 samples tiles 16x16 -> 510 tiles

2.69 release trunk without fix trunk + fix
2:28:90 3:13:71 2:04:07
System windows 7 x64. GTX 780 ( nvidia latest forgot to check exact number) Default cube 1920x1080 * 1/4 500 samples tiles 16x16 -> 510 tiles | 2.69 release | trunk without fix | trunk + fix| | -- | -- | -- | | 2:28:90| 3:13:71 | 2:04:07 |

Added subscriber: @jpbouza-4

Added subscriber: @jpbouza-4

@jpbouza-4, can you reproduce this issue?

@jpbouza-4, can you reproduce this issue?
Member

@brecht I am beginning to suspect that it might not be the changes to device_cuda.

Those are not the only thing that changed. I don't have a ubuntu 12.04 machine to build a few build but it would be good to bisect this.

@brecht I am beginning to suspect that it might not be the changes to device_cuda. Those are not the only thing that changed. I don't have a ubuntu 12.04 machine to build a few build but it would be good to bisect this.

OK

System Specs:

i7 3930k
32G RAM
Geforce TITAN
Kubuntu 13.04

Cube with 100 samples and just 1 tile (1920x1080)

2.69 ----- 2.73 secs
f91368d ------ 3.47 secs
today's build 688d098 ----- 2.51

So, apparently the fix was merged into trunk and now the performance is better than in 2.69.

The only peculiar thing I can mention is that perormance seems to be the opposite with 64x64 tiles, as in the latest build it takes 7 sec, while in the other build and in 2.69 it takes 5...

OK System Specs: i7 3930k 32G RAM Geforce TITAN Kubuntu 13.04 Cube with 100 samples and just 1 tile (1920x1080) 2.69 ----- 2.73 secs f91368d ------ 3.47 secs today's build 688d098 ----- 2.51 So, apparently the fix was merged into trunk and now the performance is better than in 2.69. The only peculiar thing I can mention is that perormance seems to be the opposite with 64x64 tiles, as in the latest build it takes 7 sec, while in the other build and in 2.69 it takes 5...
Member

@jpbouza-4 thx for these numbers.

@GabrielCaraballo it would be good to know what it is in your scene that causes this or what change to cycles causes this for you. Can you share any details on the scene/ shaders anything

Does it also happen if you override your materials with a diffuse 80 % grey shader for example ?

@jpbouza-4 thx for these numbers. @GabrielCaraballo it would be good to know what it is in your scene that causes this or what change to cycles causes this for you. Can you share any details on the scene/ shaders anything Does it also happen if you override your materials with a diffuse 80 % grey shader for example ?

So, it seems that at the moment we have no way to reproduce the problem, with different scenes on different graphics cards.

@GabrielCaraballo, would it be possible to send me a .blend file with this problem privately to brecht@blender.org?

So, it seems that at the moment we have no way to reproduce the problem, with different scenes on different graphics cards. @GabrielCaraballo, would it be possible to send me a .blend file with this problem privately to brecht@blender.org?

This comment was removed by @GabrielCaraballo

*This comment was removed by @GabrielCaraballo*

This issue was referenced by blender/blender-addons-contrib@6b1a4fc66e

This issue was referenced by blender/blender-addons-contrib@6b1a4fc66ef4e3197601318ce4c36db2c8359b98

This issue was referenced by 6b1a4fc66e

This issue was referenced by 6b1a4fc66ef4e3197601318ce4c36db2c8359b98

I got a production file now from @GabrielCaraballo to test this, but couldn't confirm any major impact testing with 460 GTX and 650M GT on Ubuntu Linux.

It may be a different issue, but given that #38712 (Cycles render is not updating) also is a problem, I reverted these busywait changes for now. I rather postpone it than trying to find fixes at the last moment. If you test the next build, then we will at least know if this is the cause of the problem.

I got a production file now from @GabrielCaraballo to test this, but couldn't confirm any major impact testing with 460 GTX and 650M GT on Ubuntu Linux. It may be a different issue, but given that #38712 (Cycles render is not updating) also is a problem, I reverted these busywait changes for now. I rather postpone it than trying to find fixes at the last moment. If you test the next build, then we will at least know if this is the cause of the problem.

More times with 688d098:

Resolution: 2048x858 100%
Tile Size: 256x215

  • 688d098= 5:07 minutes
  • 2.69= 4:03 minutes

Resolution: 2048x858 100%
Tile Size: 2048x858

  • 688d098= 4:07 minutes
  • 2.69= 4:21 minutes

Same times for me and @jpbouza-4 .

jpbouza will try 6b1a4fc66e .

More times with 688d098: Resolution: 2048x858 100% Tile Size: 256x215 - 688d098= **5:07** minutes - 2.69= **4:03** minutes Resolution: 2048x858 100% Tile Size: 2048x858 - 688d098= **4:07** minutes - 2.69= **4:21** minutes Same times for me and @jpbouza-4 . jpbouza will try 6b1a4fc66e .

Is the regression fixed for you guys?

Is the regression fixed for you guys?

Render times with e7f3424 :

Resolution: 2048x858 100%
Tile Size: 256x215

  • e7f3424 = 5:22 minutes
  • 2.69 = 4:03 minutes

Resolution: 2048x858 100%
Tile Size: 2048x858

  • e7f3424 = 5:45 minutes
  • 2.69 = 4:21 minutes

Not fixed :(

Render times with e7f3424 : Resolution: 2048x858 100% Tile Size: 256x215 - e7f3424 = **5:22** minutes - 2.69 = **4:03** minutes Resolution: 2048x858 100% Tile Size: 2048x858 - e7f3424 = **5:45** minutes - 2.69 = **4:21** minutes Not fixed :(

Then I run out of ideas here.
We need someone else who can confirm the issue (maybe it's a Titan only issue).

@GabrielCaraballo You mention that you have a Geforce 430 as well, can you run some tests there? This way we would know if its a Titan only issue or maybe something special on your system.

Then I run out of ideas here. We need someone else who can confirm the issue (maybe it's a Titan only issue). @GabrielCaraballo You mention that you have a Geforce 430 as well, can you run some tests there? This way we would know if its a Titan only issue or maybe something special on your system.

@ThomasDinges , with the GeForce 430:

Dimensions: 2048x858, Tile size: 256x215

  • e7f3424 = 13:44 minutes
  • 2.69 = 13:25 minutes

Apparently is a TITAN specific issue.

@ThomasDinges , with the GeForce 430: Dimensions: 2048x858, Tile size: 256x215 - e7f3424 = **13:44** minutes - **2.69** = **13:25** minutes Apparently is a TITAN specific issue.
Brecht Van Lommel changed title from Cycles GPU Performance Regression to Cycles GPU Performance Regression on Titan 2014-03-03 18:32:28 +01:00

Maybe this is the same issue as here? https://developer.blender.org/T39089

Do you have any nodes connected to the Volume Output in those files? If so, remove the link.

Maybe this is the same issue as here? https://developer.blender.org/T39089 Do you have any nodes connected to the Volume Output in those files? If so, remove the link.
Brecht Van Lommel was assigned by Sergey Sharybin 2014-04-10 18:43:14 +02:00

Added subscriber: @Sergey

Added subscriber: @Sergey

@brecht, i would ask you to keep track of this report. Currently it looks like being abandoned.

@brecht, i would ask you to keep track of this report. Currently it looks like being abandoned.

Added subscriber: @dragostanasie

Added subscriber: @dragostanasie
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
10 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#38679
No description provided.