Cycles: CUDA renderings are approximately 30% slower in 2.71-testbuild1 #40363

Closed
opened 2014-05-26 10:04:56 +02:00 by Carlo Andreacchio · 45 comments

System Information
Windows 7, x64, gtx 580, nvidia drivers 332.21

Blender Version
Broken: 2.71-testbuild1
Worked: 2.70a

Short description of error
Cycles is approximately 30% slower when rendering in 2.71 compared to 2.70

I used the Mike Pan benchmark (modified, find attached the version i used), rendered via commandline with the command blender -b BMW-MikePan.blend -o C:\tmp\ -f 1

2.70a rendered in 55 seconds
2.71 rendered in 73 seconds

This is consistent with real world scenes.

I am unable to pinpoint exactly down where it happened but somewhere between revision 25ad7b4 and revision cc439f6

BMW1M-MikePan.blend

**System Information** Windows 7, x64, gtx 580, nvidia drivers 332.21 **Blender Version** Broken: 2.71-testbuild1 Worked: 2.70a **Short description of error** Cycles is approximately 30% slower when rendering in 2.71 compared to 2.70 I used the Mike Pan benchmark (modified, find attached the version i used), rendered via commandline with the command blender -b BMW-MikePan.blend -o C:\tmp\ -f 1 2.70a rendered in 55 seconds 2.71 rendered in 73 seconds This is consistent with real world scenes. I am unable to pinpoint exactly down where it happened but somewhere between revision 25ad7b4 and revision cc439f6 [BMW1M-MikePan.blend](https://archive.blender.org/developer/F91068/BMW1M-MikePan.blend)

Changed status to: 'Open'

Changed status to: 'Open'

Added subscriber: @candreacchio

Added subscriber: @candreacchio

#40385 was marked as duplicate of this issue

#40385 was marked as duplicate of this issue
Brecht Van Lommel was assigned by Sergey Sharybin 2014-05-26 10:49:55 +02:00

Added subscribers: @brecht, @Sergey

Added subscribers: @brecht, @Sergey

@brecht, mind looking into the report?

@brecht, mind looking into the report?

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

Added subscriber: @ditos

Added subscriber: @ditos

Confirmed this slowdown

WIN 7 64Bit
GTX 650
BMW1M-MikePan-Scene
Tiles@256x256

blender-2.70-f93bc76-win64-vc12 (10.04.2014)
2min 7sec

blender-2.70-fce731a-win64 (26.05.2014)
2min 56sec

Confirmed this slowdown WIN 7 64Bit GTX 650 BMW1M-MikePan-Scene Tiles@256x256 blender-2.70-f93bc76-win64-vc12 (10.04.2014) 2min 7sec blender-2.70-fce731a-win64 (26.05.2014) 2min 56sec

I can confirm a similar slowdown on Windows with a 650M GT, but not on OS X. I'm not sure what is going on yet, don't have a setup on Windows where I can build with CUDA 6.0 at the moment.

It did try building with CUDA 6.5 early access and it seems to render a few percent faster than 2.70. So if we can't find a solution for 6.0 at least we can hope the next CUDA toolkit release solves it.

I can confirm a similar slowdown on Windows with a 650M GT, but not on OS X. I'm not sure what is going on yet, don't have a setup on Windows where I can build with CUDA 6.0 at the moment. It did try building with CUDA 6.5 early access and it seems to render a few percent faster than 2.70. So if we can't find a solution for 6.0 at least we can hope the next CUDA toolkit release solves it.
Hi brecht, many thanks for your (very) fast feedback. Here are others with the same problem. (Maybe it will help) http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering&s=6e4dda0620643f338c8d2bad0b50ce84&p=2650177&viewfull=1#post2650177
Member

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger
Member

@brecht. For windows for 2.71 I do not have a problem building sm20,21,30,35 with 5.0 and sm50 with cuda 6.0.

I think this will save us potentially a lot of reports.

@brecht. For windows for 2.71 I do not have a problem building sm20,21,30,35 with 5.0 and sm50 with cuda 6.0. I think this will save us potentially a lot of reports.

Ok, found some further clues on OS X. The official testbuild also shows the slowdown, but my own cmake build does not, while a scons build does. So this may just be a cmake/scons difference, committed a fix in {b33d83bf51e8}.

Please test own builds or tomorrow's builder.blender.org builds to see if this is fixed now.

Ok, found some further clues on OS X. The official testbuild also shows the slowdown, but my own cmake build does not, while a scons build does. So this may just be a cmake/scons difference, committed a fix in {b33d83bf51e8}. Please test own builds or tomorrow's builder.blender.org builds to see if this is fixed now.

bmw.blend, 100 samples, Windows 7 x64, Geforce 540M (scons).

Blender 2.70a: 99s
b33d83b: 115s

Thats a 16% slowdown still, but I think that's what we have to expect with CUDA Toolkit 6.0.

bmw.blend, 100 samples, Windows 7 x64, Geforce 540M (scons). Blender 2.70a: 99s b33d83b: 115s Thats a 16% slowdown still, but I think that's what we have to expect with CUDA Toolkit 6.0.

Added subscriber: @RolfJawarsch

Added subscriber: @RolfJawarsch

Windows 7 64bit BMW1M-MikePan.blend @ 200 samples, 256x256 tilesize
Commandline blender -b -o -f 1

With my old and fastest Blender Build 2.69.11 for sm_50 / GTX750ti (maxregcount 40)
00:50.98

With Blender 2.70 since launch_bounds
01:14.05

Blender 2.70.5 b33d83b
00:56.98

Blender 2.70.5 b33d83b with MAX_REGISTERS 40 for sm_50
00:52.70

Windows 7 64bit BMW1M-MikePan.blend @ 200 samples, 256x256 tilesize Commandline blender -b -o -f 1 With my old and fastest Blender Build 2.69.11 for sm_50 / GTX750ti (maxregcount 40) 00:50.98 With Blender 2.70 since _launch_bounds_ 01:14.05 Blender 2.70.5 b33d83b 00:56.98 Blender 2.70.5 b33d83b with MAX_REGISTERS 40 for sm_50 00:52.70

I have tested it on a current project...

2.70a : 2:17.82
b33d83b : 2:21.92

I think this sort of slowdown is acceptable...

The only problem now is that there is excessive ram usage..2.70a was using 632mb ram compared to b33d83b using 1359... I will make another bug report when i find the cause of this ram increase.

Thanks for quick patch!

I have tested it on a current project... 2.70a : 2:17.82 b33d83b : 2:21.92 I think this sort of slowdown is acceptable... The only problem now is that there is excessive ram usage..2.70a was using 632mb ram compared to b33d83b using 1359... I will make another bug report when i find the cause of this ram increase. Thanks for quick patch!
Member

@candreacchio Is this what blender reports as ram usage or what GPU-Z or some other util reports?
If it is the latter you might render a default cube in both releases and you might find a very much larger amount of ram even for simple scenes.

If this is the case this comes ultimately from cycles complexity and the general properties of GPU architectures. Increases the max registers will most likely drop this number but also carries a performance penalty in most cases.

@brecht I think a new bug report for this issue is best ?

@candreacchio Is this what blender reports as ram usage or what GPU-Z or some other util reports? If it is the latter you might render a default cube in both releases and you might find a very much larger amount of ram even for simple scenes. If this is the case this comes ultimately from cycles complexity and the general properties of GPU architectures. Increases the max registers will most likely drop this number but also carries a performance penalty in most cases. @brecht I think a new bug report for this issue is best ?

@MartijnBerger Already created a new bugreport with a test case, tested via GPU-Z

@MartijnBerger Already created a new bugreport with a test case, tested via GPU-Z
Member

@candreacchio
Ok in that case I think we (you) should open a separate issue. This issue is indirectly related to a possible slowdown if we change the max-registers but is will be a different slowdown then the one this report is about. ( Combination of new cuda and a buildsystem bug)

@candreacchio Ok in that case I think we (you) should open a separate issue. This issue is indirectly related to a possible slowdown if we change the max-registers but is will be a different slowdown then the one this report is about. ( Combination of new cuda and a buildsystem bug)

@MartijnBerger it has been up for about 12 hours already -- https://developer.blender.org/T40379

@MartijnBerger it has been up for about 12 hours already -- https://developer.blender.org/T40379

Added subscriber: @kopias

Added subscriber: @kopias

GTX660M BMW1M-MikePan.blend
before this patch 3:39:11
after this patch 2:58:02
official 270a 2:36:56

GTX660M BMW1M-MikePan.blend before this patch 3:39:11 after this patch 2:58:02 official 270a 2:36:56

blender-2.70-f93bc76-win64-vc12 (10.04.2014)
2min 7sec

blender-2.70-06a05e4-win64 (27.05.2014)
2min 22sec

blender-2.70-f93bc76-win64-vc12 (10.04.2014) 2min 7sec blender-2.70-06a05e4-win64 (27.05.2014) 2min 22sec

Added subscriber: @AlexanderWeide

Added subscriber: @AlexanderWeide

◀ Merged tasks: #40385.

◀ Merged tasks: #40385.

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'

Marking this resolved now, this is about as good as I can do as I tried a lot of things to get this faster with CUDA 6 (using cmake). Now that scons gives the same results as cmake the slowdown matches the performance based on which we made the decision to switch to CUDA 6. I think CUDA 6.5 will help further but that's not something we can use for this release yet.

Marking this resolved now, this is about as good as I can do as I tried a lot of things to get this faster with CUDA 6 (using cmake). Now that scons gives the same results as cmake the slowdown matches the performance based on which we made the decision to switch to CUDA 6. I think CUDA 6.5 will help further but that's not something we can use for this release yet.

Hi, is it possible to DOwnload the latest version? somewhere else?

Hi, is it possible to DOwnload the latest version? somewhere else?

You can find the latest build here: http://builder.blender.org/download/

You can find the latest build here: http://builder.blender.org/download/

Thanks i read all the posts and so on. And have to say i dont understand why the Cuda Builds of all version after 2.70a are so slow.? I got a Geforce 750 Ti OC my Test Shot needed with my old version only 1 min and 41 sec... even the latest build it took more than 2 minutes..? what happend? the speed improvment of my 750 in comparission to my old 550Ti drop nearly by one third that is nothing good. if this will not fixed i will use two different versions one for GPU and one for Smoke renderings. is it not possible to update the old faster cuda 5.0 kernel?

Thanks i read all the posts and so on. And have to say i dont understand why the Cuda Builds of all version after 2.70a are so slow.? I got a Geforce 750 Ti OC my Test Shot needed with my old version only 1 min and 41 sec... even the latest build it took more than 2 minutes..? what happend? the speed improvment of my 750 in comparission to my old 550Ti drop nearly by one third that is nothing good. if this will not fixed i will use two different versions one for GPU and one for Smoke renderings. is it not possible to update the old faster cuda 5.0 kernel?

A lot of features were added since 2.70, mainly deformation motion blur, fire and smoke, but also baking. Additional code makes our CUDA kernel slower. And we switched to the CUDA Toolkit 6.0, which comes with a slowdown unfortunately.

There is nothing we can do about this, at east not for Blender 2.71. We also don't have 10 different GPUs at hand, so we cannot test with every card. As Brecht said, switching to the upcoming CUDA Toolkit 6.5 might improve the performance again, but for now we cannot fix this.

A lot of features were added since 2.70, mainly deformation motion blur, fire and smoke, but also baking. Additional code makes our CUDA kernel slower. And we switched to the CUDA Toolkit 6.0, which comes with a slowdown unfortunately. There is nothing we can do about this, at east not for Blender 2.71. We also don't have 10 different GPUs at hand, so we cannot test with every card. As Brecht said, switching to the upcoming CUDA Toolkit 6.5 might improve the performance again, but for now we cannot fix this.

my kernel_sm_50.cubin is from ‎Thursday, ‎20. March ‎2014, ‏‎12:30:25 filesize: 1.186 kb .... its alot faster...

my kernel_sm_50.cubin is from ‎Thursday, ‎20. March ‎2014, ‏‎12:30:25 filesize: 1.186 kb .... its alot faster...

Yes, that was before all the big changes.

2.71 is the first release that officially supports sm_50 cards. So this is just the way it is for now. We can try to improve the performance later..

Yes, that was before all the big changes. 2.71 is the first release that officially supports sm_50 cards. So this is just the way it is for now. We can try to improve the performance later..

thanks for the fast reply. and your work. you done a amazing job with cycles. But i have to say i will not switch to version 2.71 till this is fixed. Because when i use it, its like i wasted my money with having a new card and lost a lot of speed in calcualtion.

but maybe there is a problem with the deformation motion blur. i dont use motion blur in my scene settings, and if i turn off the deformation blur stuff, it render a bit faster, even i turned it off in the global settings? maybe there is one of the problem areas. ... so thanks for your work... and hopefully this get solved in later builds maybe in 2.72

thanks for the fast reply. and your work. you done a amazing job with cycles. But i have to say i will not switch to version 2.71 till this is fixed. Because when i use it, its like i wasted my money with having a new card and lost a lot of speed in calcualtion. but maybe there is a problem with the deformation motion blur. i dont use motion blur in my scene settings, and if i turn off the deformation blur stuff, it render a bit faster, even i turned it off in the global settings? maybe there is one of the problem areas. ... so thanks for your work... and hopefully this get solved in later builds maybe in 2.72

I absolutely understand the problem, and we will try to solve this later.

But we are close to the 2.71 release and cannot do risky changes anymore. As said, the upcoming CUDA Toolkit 6.5 from nvidia will hopefully improve the performance again. But we have to wait until that gets released. :)

I absolutely understand the problem, and we will try to solve this later. But we are close to the 2.71 release and cannot do risky changes anymore. As said, the upcoming CUDA Toolkit 6.5 from nvidia will hopefully improve the performance again. But we have to wait until that gets released. :)

Added subscriber: @alex-217

Added subscriber: @alex-217

@AlexanderWeide: Brecht commited a improvement, please check a new Build from the Buildbot tomorrow (we only build 1x a day). https://developer.blender.org/rB55e4454db8edac58c3d64271d84263e5bb5e9c29

@AlexanderWeide: Brecht commited a improvement, please check a new Build from the Buildbot tomorrow (we only build 1x a day). https://developer.blender.org/rB55e4454db8edac58c3d64271d84263e5bb5e9c29

Removed subscriber: @ThomasDinges

Removed subscriber: @ThomasDinges

Added subscriber: @ThomasDinges
Removed subscriber: @alex-217

Added subscriber: @ThomasDinges Removed subscriber: @alex-217

thanks i will do;)

thanks i will do;)

Just tested it on a real world scene... lots of trees and foliage... Following results from a GTX 580

Blender 2.70a -- 2:30.40 rendertime -- 1351 ram usage
Blender 2.71-testbuild2 -- 2:10.55 rendertime -- 1431mb ram usage

Speed is faster now thanks to the optimizations throughout this release.

Thanks guys!

Just tested it on a real world scene... lots of trees and foliage... Following results from a GTX 580 Blender 2.70a -- 2:30.40 rendertime -- 1351 ram usage Blender 2.71-testbuild2 -- 2:10.55 rendertime -- 1431mb ram usage Speed is faster now thanks to the optimizations throughout this release. Thanks guys!

thanks the Testbuild Two is acceptable.. Nice work !!! my test scene only took 2 sec more.. that is ok..;)

thanks the Testbuild Two is acceptable.. Nice work !!! my test scene only took 2 sec more.. that is ok..;)
Member

Just a quick note.
I am not a native speaker but I think we should try and leave the acceptability to the people who have to accept the change in the end.
It might stem from a mistranslation / miss alignment with English but to me sentences like "this is totally unacceptable" somehow sound aggressive and if the person saying that is somehow the person who has to power to accept or not accept some change in blender.

Any and all regressions are bad and as far as I know all developers hate performance regressions with a passion but sometimes decision need to be made that negatively impact performance. This is not fun and probably never will be. But I think end the end it will have to be up to the people doing the actual work to decide if it is "acceptable".

just my 2 cents. And sorry if it is off topic

Just a quick note. I am not a native speaker but I think we should try and leave the acceptability to the people who have to accept the change in the end. It might stem from a mistranslation / miss alignment with English but to me sentences like "this is totally unacceptable" somehow sound aggressive and if the person saying that is somehow the person who has to power to accept or not accept some change in blender. Any and all regressions are bad and as far as I know all developers hate performance regressions with a passion but sometimes decision need to be made that negatively impact performance. This is not fun and probably never will be. But I think end the end it will have to be up to the people doing the actual work to decide if it is "acceptable". just my 2 cents. And sorry if it is off topic

Sorry for the off topic. But i have to say something. Sorry for my words, but in my point of view blender will not make the breakthrough to the industry with the tools itself inside blender. it can only conquer the markets with cycles. The Key is to be faster and with more quality then other render tools. That is for alot of guys i know for myself the only reason to make the change to blender. And so in my point of view it will be not good to drop the performance of some of the tools. Because a true artist dont need a new special button for creating a very good new simulation in physics and so on. Every Studio is working with different tools to get the best out of every software. Blenders Cycles Engine is the fastest VFX Renderengine i know for years. in my entire life, and i do it now for more than 17 years. I know exactly from what iam talking about. The key is speed and Quality inside Cycles. That is the most important point for alot of guys to make the switch...

And the Cycles development is Awesome, Thanks to every single line code from every Developer and Special Thanks to Brecht.

....

Sorry for my words but that is my opinion. ... 7 Artists i know which are working inside Studios told me. Cycles is the only reason to change...right now.. sorry to say but that is true.. and cycles with blender is very good in that case.

Sorry for the off topic. But i have to say something. Sorry for my words, but in my point of view blender will not make the breakthrough to the industry with the tools itself inside blender. it can only conquer the markets with cycles. The Key is to be faster and with more quality then other render tools. That is for alot of guys i know for myself the only reason to make the change to blender. And so in my point of view it will be not good to drop the performance of some of the tools. Because a true artist dont need a new special button for creating a very good new simulation in physics and so on. Every Studio is working with different tools to get the best out of every software. Blenders Cycles Engine is the fastest VFX Renderengine i know for years. in my entire life, and i do it now for more than 17 years. I know exactly from what iam talking about. The key is speed and Quality inside Cycles. That is the most important point for alot of guys to make the switch... And the Cycles development is Awesome, Thanks to every single line code from every Developer and Special Thanks to Brecht. .... Sorry for my words but that is my opinion. ... 7 Artists i know which are working inside Studios told me. Cycles is the only reason to change...right now.. sorry to say but that is true.. and cycles with blender is very good in that case.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
10 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#40363
No description provided.