GPU rendering taking 4x longer than CPU with smoke effects
Closed, ArchivedPublic

Description

System Information
Mac OSX 10.10.4 Yoesmite
2013 MacPro Dual D700s, 8-Core 3.0 Ghz 64GB RAM
Titan X 12GB VRAM

Blender Version
2.77 RC2
Worked: (optional)

Short description of error
The frames Im rendering take 90mins on an i7 quad-core CPU with 16GB RAM. On the 2013 MacPro with the Titan card those same frames are taking 2hrs 50mins. Something is incredibly wrong. The scene is very simple but does contain smoke/fire effects which I know were added in 2.77 for GPU rendering.

Details

Type
To Do

CPU uses more powerful Volume sampling code, that might explain why it takes longer on GPU. Or this is related to this discussion here: https://developer.blender.org/T45093

"CPU uses more powerful Volume sampling code, that might explain why it takes longer on GPU."

Yes the GPU does take longer to render in 2.77, but irrespective of the CPU, the GPU is still taking 7-8x longer to render than the exact same block of my scene render in 2.76. My render in 2.77 rc2 takes takes 8mins and 12sec whereas in 2.76 it takes 1min and 41secs? This is a big concern as we just spent a large amount of money on GPUs for rendering and in 2.77 all the speed advantage we gained by rendering with the GPU in 2.76 is not only lost, but made even slower using the GPU in 2.77.

The CPU render time in 2.76 was 4:25, and the CPU render time in 2.77 rc2 was 4:13. A slight increase, but again, this is irrespective of the drastic loss in GPU speed. This is concerning as we just spent $2000 in GPUs to increase render times, and in 2.77 all that is lost. I'd stick with 2.76, but quite a few of our scenes have smoke, which was just enabled for GPU rendering in 2.77.

Hi, please add a .blend shows the slowdown, may append smoke objects in new file.
Developer/user can check on other OS/GPU CPU combinations.

Mib
EDIT: Eh, it was no Volume render in 2.76b on GPU!

By the way 90mins verses 170 is not 4 time slower but only 1.8 times slower

Just chiming in here. Have a TitanX and 780ti running Windows 10 with latest Nvidia drivers, and GPU rendering for smoke is incredibly fast compared to CPU (the Intel 3770K, an admittedly older processor). Doing some tests on a tornado scene in 360 equirectangular and it works quite well. Highly possible it's an OSX problem, as when we switched from OSX to Windows/Linux we saw a massive performance boost, not to mention that the CUDA drivers were far more reliable and up to date.

For stats:

2K render, 30 samples (it's a test), to smoke emitters, giant domain, equirectangular renders:

GPU (TitanX + 780ti) = 4m 49s
CPU (Intel 3770K)= 27m 31s

EDIT: Be advised... I did have to reduce the render tile size from 256 or 512 down to 128. Allowed the cards to crunch through the numbers faster, as the timeout delay in Windows was struggling under the larger tilesize.

ok, so you meant to say "GPU uses more powerful Volume sampling code, that might explain why it takes longer on GPU?"

Thanks for the help! I did what you suggested and will upload my .blend from 2.76 (without the smoke domain). And I will upload a separate .blend (created with 2.77 rc2) that has only the smoke domain in it.

I noticed something interesting with GPU rendering:

  1. When I open the .blend file (from 2.76) into 2.77 rc2, and append the smoke domain, it renders slowly (as Ive experienced).
  2. But when I created a new .blend in 2.77 and then imported the smoke domain, it rendered fast! So then I went a step further and imported my meshes of the altar into the new .blend, and it still renders fast.
  3. Then, I imported the "sun" lamp and now its rendering slow again. So I dont know whats going on.

Here are the .blend files. Hopefully you can figure something out.


Joel, do you have a .blend file for this that I could try on OSX to compare render times?

For stats:

2K render, 30 samples (it's a test), to smoke emitters, giant domain, equirectangular renders:

GPU (TitanX + 780ti) = 4m 49s
CPU (Intel 3770K)= 27m 31s

Considering ours is a pretty complicated setup from a tornado scene it's probably not the best thing to send over. If you'd like, you (or I) can create a small test scene and we can compare render times on that!

Considering ours is a pretty complicated setup from a tornado scene it's probably not the best thing to send over. If you'd like, you (or I) can create a small test scene and we can compare render times on that!

If you download the "Smoke Domain.blend" above you could try frame #600 of that, and let me know what time you get on the GPU in 2.77. Then I can compare it to mine.

Copy that! I'll download it and give it a go.

Hmmm. I'm getting some interesting results. Especially when compared to the 780ti.

Before I go into more detail, Mike, could you download this file and render it on your TitanX under 2.77 and let me know how long your render time is? Maybe even try 2.76 as well.

yeah Ill give it a try.

Render Results for Test.blend:

2.77:
GPU = 8.9 seconds
CPU = 21.95 seconds

2.76:
GPU = 8.8 seconds
CPU = 23.8 seconds

There is no smoke in this scene, so Im not sure what you're testing here? Its the smoke effects that are slowing down my renders on the Titan.

Thanks Mike!

So I'm getting some differing results here on our systems between the different versions AND different GPUs. Obviously we can't test Smoke on GPU on 2.76 which is why I was just doing a test on your system, interestingly enough they are both identical regardless of the version. Not what we're getting here. Anyway...

Test breakdown:

On your smoke file, 2.77 frame 600, 850 samples:
TitanX - 12m 12s
780ti - 1m 29s

On my Test.blend (no smoke), 50 samples:
TitanX - version 2.77 - 0m 20s
780ti - version 2.77 - 0m 8s
TitanX - version 2.76b - 0m 11s
780ti - version 2.76b - 0m 8s

To be noted, the 2.77 build used was the latest 64bit build available for Windows on builder.blender.org dated March 8th (for some reason quite behind the others)

Conclusions?
So between the two builds, something has happened that has slowed down the TitanX by a considerable amount, especially when compared to the 780ti of which it should trounce soundly. ESPECIALLY on your smoke FX file. I also tried the Gooseberry build of 2.76b and it displayed the same 20s render time on the TitanX, so I'm concluding that somewhere between the Gooseberry build of 2.76 and the 2.77 builds, something was changed that caused a huge performance loss in the TitanX. Which, in the case of Windows, is showing up not only in Smoke on GPU but also in standard cycles rendering.

Is it possible there is something that could be slowing down the TitanX in Cycles due to the large amount of vRAM the TitanX has? The 780ti only has 3GB whereas the TitanX has 12GB. Any devs know if there was something changed in the Cycles kernel that would be detrimental to higher performance cards?

Thanks for letting me chime in, Mike. :) My apologies if these issues are not related.

Apologies for the additional comment: Just to verify, ran these same tests on an AMD workstation also running a TitanX (but no 780ti) and the render times were in line (though a little bit longer, cause it's AMD) with the above, so it's not isolated to just this machine but seems to be in line with people who are running the TitanX.

Would be curious to see if anyone with a 980 or 980ti with the larger Vram than the 780ti is having these same problems.

Apologies for the additional comment: Just to verify, ran these same tests on an AMD workstation also running a TitanX (but no 780ti) and the render times were in line (though a little bit longer, cause it's AMD) with the above, so it's not isolated to just this machine but seems to be in line with people who are running the TitanX.

Would be curious to see if anyone with a 980 or 980ti with the larger Vram than the 780ti is having these same problems.

Thanks Joel. Very interesting that the 780ti is screaming faster than the TitanX by a long shot -- and even to some degree without smoke. We have another 2013 MacPro that is running the 980 Ti so I will test that thisafternoon. We just sank a bunch of money into these GPU cards so its painful that 2.77 voids all the extra horsepower we just bought.

Agreed--we did the same thing to handle the VR loads that we're undertaking since we're constantly pushing 6+GB of vRAM on the equirectangular renders. Do try it on the 980ti and let me know if that solves an issue, otherwise, devs, take a look!

Ok here is a comparison. We rendered the "Altar of Sacrifice without smoke.blend" but added the smoke domain into it.

These renders were at 10 samples:

(minutes:seconds)

2013 MacPro #1 w Titan X GPU = 2:28
2013 MacPro #1 CPU = 1:00
2013 MacPro #2 w 980 Ti = 2:21
2013 MacPro #2 CPU = 1:00
2013 MacPro #1 CPU (in v2.76) = 1:04

All these seems to indicate is that the 980 Ti and the Titan X are both affected similarly.

Interesting. So the 980ti might be included in the cards affected as well.

I just did a test by downgrading the Nvidia drivers and CUDA versions a few steps back just in case there was something that Nvidia did that was causing the issues but the render times were identical. Also did a few benchmark tests on the GPUs to make sure that it wasn't a hardware issue causing the TitanX to underpreform, but in the benchmarks the TitanX beats the 780ti by a decent margin--as expected--until it comes down to rendering in Cycles.

Sounds like we got a bug!

Thanks for the help Joel. Im out of ideas but, I know this will affect a ton of users, as the Titan and 980 Ti cards are among the the top cards on the market right now.

I only have access to 960m but I think this is the same cause as T45093

Yep, looking through the thread, looks like it's the same thing. @Mike (LMProductions) may be worth you or I posting over there with information here.

Let me clarify some things:

  • Slower rendering of volumes which are using 3D textures (smoke, fire, etc) on GPU in comparison with CPU is not considered a bug. This is a totally new feature for GPU which needs polish, benchmarks and fine tuning.
  • However, if some file which doesn't have smoke/fire textures has a significant slowdown in 2.77 in comparison with 2.76 that we should look into.

In order to troubleshoot this we should have a /blend file which renders fast in 2.76 but renders slow in 2.77 release. So far benchmarks are really confusing: they are comparing different releases with different cards on different files. This isn't helpful at all.

So please bother to provide as simple as possible file which clearly demonstrates speed regression.

Sergey Sharybin (sergey) triaged this task as "Incomplete" priority.Mar 22 2016, 2:22 PM

Let me clarify some things:

  • Slower rendering of volumes which are using 3D textures (smoke, fire, etc) on GPU in comparison with CPU is not considered a bug. This is a totally new feature for GPU which needs polish, benchmarks and fine tuning.
  • However, if some file which doesn't have smoke/fire textures has a significant slowdown in 2.77 in comparison with 2.76 that we should look into.

    In order to troubleshoot this we should have a /blend file which renders fast in 2.76 but renders slow in 2.77 release. So far benchmarks are really confusing: they are comparing different releases with different cards on different files. This isn't helpful at all.

    So please bother to provide as simple as possible file which clearly demonstrates speed regression.

Thanks for clarifying. The "Smoke Domain.blend" is as simple as I could get it. Its just a fire. No mesh objects or anything else. Joel confirmed "between the two builds, something has happened that has slowed down the TitanX by a considerable amount, especially when compared to the 780ti of which it should trounce soundly."

So this same .blend file is showing significant slow downs for whatever reason on at least two different systems but using the same card, which as Joel said, makes it seem specific to the Titan (and the 980 Ti cards with the same chipset, as mentioned in the other case).

I understand the GPU rendering with smoke is a new feature (which is GREAT!!), but with the renders being many times slower than the CPU it negates the usability of this new feature entirely :( I hope you guys can resolve this! Let me know however I can help!

@Mike (LMProductions), i don't really get that :(

Just a fire would not work in Blender 2.76. so comparing latets builds with 2.76 is not fair. Surely enough 2.76 will be faster because smoke will just be an empty texture without scattering and absorbtion happening.

Or did i miss something in conversation and speed regression happened after smoke was added to Blender?

Are there non-smoke/fire scenes with huge performance difference between 2.76 and 2.77/latest master?

Are there non-smoke/fire scenes with huge performance difference between 2.76 and 2.77/latest master?

So far as Ive noticed, there seems to be no difference in rendering speeds between 2.76b and 2.77, except when it comes to rendering the smoke stuff on the GPU.

@Sergey Sharybin (sergey) hopefully I can chime in and explain the situation more clearly! :)

There are TWO issues going on here. @Mike (LMProductions) originally posted about slow smoke performance on GPU. I came in to run some tests using my TitanX and 780ti as a comparison. Obviously, there was no Smoke on GPU before 2.67 so you are right the comparisons from version to version are not fair. However, performance from GPU to GPU seemed to be the most helpful comparison tool. It is upon doing some tests between my TitanX and 780ti when I noticed a large discrepancy between card performance, which then caused me to do non-smoke tests to verify that there was something up with Cycles between the two GPU generations.

Which is then where I discovered that somewhere along the line Cycles performance on the new Maxwell cards seemed to be severely hindered when compared to previous generations of Blender.

So to recap, in this thread we have two related issues:

  1. Smoke on GPU performance is heavily reduced with Maxwell line GPUs (Titan X, 980, 980ti, etc) in 2.77
  2. Traditional Cycles (non GPU) performance is reduced on Maxwell line GPUs in 2.77

Benchmarks regarding issue #1: Smoke on GPU comparisons

Using Mike's smoke file (posted above), 2.77 frame 600, 850 samples:
TitanX - 12m 12s
780ti - 1m 29s

NEW benchmarks regarding issue #2, Cycles performance between previous versions and 2.77 on Maxwell GPUs
BMW27 test using latest BMW scene from https://www.blender.org/download/demo-files/

Blender 2.75 gooseberry branch
TITAN X: 1:31.11
780ti: 1:06.80

Blender 2.76b
TITAN X: 1:32.96
780ti: 1:05.13

Blender 2.77 stable release
TITAN X: 2:19.70
780ti: 1:05.85

CONCLUSIONS:
As you can see from the previous smoke benchmarks and the BMW benchmarks I ran this morning, somewhere between 2.76b and 2.77 the TitanX took a big hit on performance (at last a 75% reduction in performance in non smoke and a massive 90% performance loss when compared to the 780ti when rendering smoke).

@Sergey Sharybin (sergey) we leave this in your capable hands. :)

CONCLUSIONS:
As you can see from the previous smoke benchmarks and the BMW benchmarks I ran this morning, somewhere between 2.76b and 2.77 the TitanX took a big hit on performance (at last a 75% reduction in performance in non smoke and a massive 90% performance loss when compared to the 780ti when rendering smoke).

@Sergey Sharybin (sergey) we leave this in your capable hands. :)

Well stated Joel. Thanks for the help and the testing!

Titanx issue is known, will merge this task in relevant one. ;)

@Bastien Montagne (mont29) brilliant mate, thanks so much! Glad to know you guys are on it.

@Mike (LMProductions), I'm going to continue the discussion from T45093#376557 here because not everyone subscribed to that other thread needs to get emailed about this.

Please confirm that the latest build from builder.blender.org still has the issue. If the issue was already solved then there is no point reopening this report. The render times and tests from March are outdated now after all the changes that have been done since then.

@Mike (LMProductions), I'm going to continue the discussion from T45093#376557 here because not everyone subscribed to that other thread needs to get emailed about this.

Please confirm that the latest build from builder.blender.org still has the issue. If the issue was already solved then there is no point reopening this report. The render times and tests from March are outdated now after all the changes that have been done since then.

Yes. I confirmed in my first response to you in the other thread that the latest build, released today, does not fix this problem. If there is anything else I can help do or test, let me know.

Brecht Van Lommel (brecht) reopened this task as "Open".Jun 1 2016, 2:39 AM
Brecht Van Lommel (brecht) changed Type from Bug to To Do.

Ok, I've reopened the report now. It wasn't clear to me that you actually tested a build from http://builder.blender.org/download, since you mentioned "Blender 2.77a" which is terminology that we would use to refer to a specific release from http://www.blender.org/download/, not nightly builds.

This task is approaching 90-days since it was opened. Is there any progress or work being conducted on this yet on the Mac side of things?

As far as I know, no. None of the developers have a Mac with that graphics card to investigate this and we don't have a good guess for why this is slow with this particular OS / GPU combo.

I too am having the same issue... I have 3 Geforce GTX 960's 2GB SSC Edition GPU's In my computer... I am trying to render an object using the GPU's and it takes about 28 Seconds for 1/2 of the image to render then it tells me it is out of memory on the card(s.) Even though during rending it is showing only a peak VRAM usage of 55Mb. I am using a rending tile of 256x256 for the GPU.

I then switch over to my Intel I-7 3940k which typically renders about the same speed of ONE of my graphics cards... But with the tiles set to 16x16 it renders the entire scene in 20 seconds. Which is 8 seconds quicker than all 3 of my Gpu's, and without any errors.

I have only ever run into this problem with the newest build being 2.77.

Potentially Important Computer specs are as follows:

-64GB DDR3 1600Mhz RAM (Ripjaws)
-3x Geforce GTX 960 SSC Edition Graphics Cards (2GB)
-Intel I-7 3940k Hex Core Processor over clocked from 3.4GHz to 4.3Ghz (Water cooled.)
-256Gb Samsung 850 Pro SSD (I run blender off of)
-2x 1TB HDD and 1 4TB HDD (Which have some of the files my scenes use stored on them, typically fluid sims.)

Hmm... Simply loading another file... Rendering that file using GPU and then Loading and rendering the file I was having issue's with fixed the problem...

Render times dropped from 20 Seconds CPU down to 14 Seconds... And 28 Seconds GPU(S) down to 4 Seconds... Weird...

Bastien Montagne (mont29) raised the priority of this task from "Incomplete" to "Normal".Jul 15 2016, 4:12 PM

Hi, is this issue maybe resolved with Render 2.78?

I am starting with Blender rendering, and since i have a MacPro thinking about to get a Titan X. But if it still has problems in Mac El Capitan, then i will wait.

Yes its still an issue. To my knowledge nobody has even worked on fixing this issue. Furthemore Blender 2.78 renders the light intensity differently on the GPU than it does on the CPU. We have a renderfarm with some machines rendering on the CPU and some on the GPU where available, and on our animations the fire was flickering because the light was different. So we went back to 2.77. 2

Aaron Carlisle (Blendify) lowered the priority of this task from "Normal" to "Incomplete".Apr 14 2017, 9:34 PM

Can people still reproduce?

Aaron Carlisle (Blendify) closed this task as "Archived".May 16 2017, 8:52 PM
Aaron Carlisle (Blendify) claimed this task.

Since last asking for information it has been 7 or more days, due to the policy of our bug tracker
we will have to archive the report until the requested information is given.