Page MenuHome

Cycles GPU Performance Regression on Titan
Closed, ArchivedPublic

Description

System Information
Ubuntu 12.04, kernel 3.2.0-58-generic
Intel® Xeon(R) CPU E5620 @ 2.40GHz × 16
7,7 GiB RAM
GeForce GT 430
NVIDIA Corporation GK110 [GeForce GTX Titan] (rev a1)

NVIDIA driver version: 331.20
X.Org: 1.11.3

Blender Version
Broken: fdcdd5e
Worked: 2.69

Short description of error
Rendering a project with GeForce GTX Titan (only) on Blender fdcdd5e takes double time than on Blender 2.69.

Exact steps for others to reproduce the error

  • Open Blender 2.69
  • Change Render Engine to Cycles
  • Set Render Samples to 100
  • Render time: 2 seconds
  • Open Blender fdcdd5e
  • Change Render Engine to Cycles
  • Set Render Samples to 100
  • Render time: 4 seconds

Event Timeline

Gabriel Caraballo (eibriel) created this task.
Gabriel Caraballo (eibriel) raised the priority of this task from to Needs Triage by Developer.

On GPU.

The name of the bug must be: Cycles GPU Performance Regression (I can't change it now)
The regression is only rendering on GPU.

Brecht Van Lommel (brecht) renamed this task from Cycles Performance Regression to Cycles GPU Performance Regression.

There was a change to avoid CPU load at 100% during GPU render, which may have a big impact on very simple renders like the default cube. However for more complex scenes the impact should be be a lot less.

Can you test what kind of impact there is on more complex scenes?

Brecht Van Lommel (brecht) triaged this task as Needs Information from User priority.Feb 17 2014, 2:37 PM

Sure Brecht, a Kiribati frame takes 5 minutes with fdcdd5e, and 2:30 minutes on 2.69

Brecht Van Lommel (brecht) raised the priority of this task from Needs Information from User to Normal.

Thanks, that definitely needs to be fixed then.

@Gabriel Caraballo (eibriel)
How about more complex scene with longer rendering time? Is it still double rate time difference? I have GTX 560Ti 2Gb. Comparing Blender 2.69.0 and today's buildbot version I got "only" 10% regression.

For me the regression depends very much on the OS and or drivers and the average time a tile takes.

The time we lose should be per tile. We could en-queue the next tile for single GPU case to take away most of the pain.
But proper a-syncify-ing device_cuda takes a non trivial effort and it on my todo-list.

@Brecht Van Lommel (brecht) Ill be on irc tonight if you want to talk about this. but we can always revert to busy waiting or make it user definable

If we can't find a fix quickly we should indeed revert this change and move it to the next release.

This patch seems to work well for me on OS X. I tried rendering the default cube with 100 samples and the default (small) tile size, and it does make a big performance difference there. For more complex scenes I didn't notice that much difference but it worked fine.

Viewport rendering seems to be stable here as well, so it might be good to commit this and get feedback from more people.

This fix adds some extra pandemic sync() for the case where we cancel the task.

@Gabriel Caraballo (eibriel) do you have an option to compile with a patch or should I make a build for you to test with ?

@Gabriel Caraballo (eibriel) after consulting with @Brecht Van Lommel (brecht) I committed a posible fix rBf1aeb2ccf4

tomorrows http://builder.blender.org/ builds should be good for testing this.

The latest 64 bit linux build here now has the fix:
http://builder.blender.org/download/

Thanks @Brecht Van Lommel (brecht). New Render Times:

Blender2k half2k
f1aeb2c2:319:25
2.691.295:35

Tile Size: 256x215

Two Render Layers, Foreground and Background. Some compositing for color correction. (Compositing time is the same, don't affect)

wow . that is bad...

I have a problem that I cannot reproduce a case where I get such bad number.

The closest I have to your hardware is a GTX 780 (the plain one) and it is windows

But there in BMW M1 and Barcelona I have almost no regression even without the proposed fix.
I do use larger tiles ~90k pixels per tile tiles are all same size. (h/4 , w/4)

That is weird, both the Titan and the GTX 780 are sm_35 and should therefore use the same code...

Windows/Linux issue? We should be able to find other Titan users, who can confirm/disconfirm the issue.

After the fix it should be at most a few ms per tile. say there are 50 tiles per frame and 2 layers. then 1 sec is the absolute max regression there should be.

I would want to work towards a scene / case where I can reproduce and fix the problem. If sharing the problem scene is not possible maybe @Gabriel Caraballo (eibriel) can also try BMW or Barcelona so we can bisect the issue some more.

Hi, I also cant reproduce the regression with BMW and Barcelona files.

Opensuse 13.1/64
Intel i5 3770K
GTX 760
GTX 560Ti 448 Cores
Driver 331.20

BMW, set tiles to 256x256:

http://www.blenderartists.org/forum/showthread.php?239480-2-6x-Cycles-render-benchmark

Barcelona is not available from orginal BA thread, uploaded on pastall.
Please switch scene to Benchmark1 GPU.

http://www.pasteall.org/blend/26908

@Gabriel Caraballo (eibriel), Hi, I think it is not possible for you to upload sample file, maybe a simplified one?

Cheers, mib.

System windows 7 x64.
GTX 780 ( nvidia latest forgot to check exact number)

Default cube 1920x1080 * 1/4
500 samples tiles 16x16 -> 510 tiles

2.69 releasetrunk without fixtrunk + fix
2:28:903:13:712:04:07

@Brecht Van Lommel (brecht) I am beginning to suspect that it might not be the changes to device_cuda.

Those are not the only thing that changed. I don't have a ubuntu 12.04 machine to build a few build but it would be good to bisect this.

OK

System Specs:

i7 3930k
32G RAM
Geforce TITAN
Kubuntu 13.04

Cube with 100 samples and just 1 tile (1920x1080)

2.69 ----- 2.73 secs
f91368d ------ 3.47 secs
today's build 688d098 ----- 2.51

So, apparently the fix was merged into trunk and now the performance is better than in 2.69.

The only peculiar thing I can mention is that perormance seems to be the opposite with 64x64 tiles, as in the latest build it takes 7 sec, while in the other build and in 2.69 it takes 5...

@Juan Pablo Bouza (jpbouza) thx for these numbers.

@Gabriel Caraballo (eibriel) it would be good to know what it is in your scene that causes this or what change to cycles causes this for you. Can you share any details on the scene/ shaders anything

Does it also happen if you override your materials with a diffuse 80 % grey shader for example ?

So, it seems that at the moment we have no way to reproduce the problem, with different scenes on different graphics cards.

@Gabriel Caraballo (eibriel), would it be possible to send me a .blend file with this problem privately to brecht@blender.org?

This comment has been deleted.

I got a production file now from @Gabriel Caraballo (eibriel) to test this, but couldn't confirm any major impact testing with 460 GTX and 650M GT on Ubuntu Linux.

It may be a different issue, but given that T38712: Cycles render is not updating also is a problem, I reverted these busywait changes for now. I rather postpone it than trying to find fixes at the last moment. If you test the next build, then we will at least know if this is the cause of the problem.

More times with 688d098:

Resolution: 2048x858 100%
Tile Size: 256x215

  • 688d098= 5:07 minutes
  • 2.69= 4:03 minutes

Resolution: 2048x858 100%
Tile Size: 2048x858

  • 688d098= 4:07 minutes
  • 2.69= 4:21 minutes

Same times for me and @Juan Pablo Bouza (jpbouza) .

jpbouza will try rB6b1a4fc66ef4 .

Is the regression fixed for you guys?

Render times with e7f3424 :

Resolution: 2048x858 100%
Tile Size: 256x215

  • e7f3424 = 5:22 minutes
  • 2.69 = 4:03 minutes

Resolution: 2048x858 100%
Tile Size: 2048x858

  • e7f3424 = 5:45 minutes
  • 2.69 = 4:21 minutes

Not fixed :(

Then I run out of ideas here.
We need someone else who can confirm the issue (maybe it's a Titan only issue).

@Gabriel Caraballo (eibriel) You mention that you have a Geforce 430 as well, can you run some tests there? This way we would know if its a Titan only issue or maybe something special on your system.

@Thomas Dinges (dingto) , with the GeForce 430:

Dimensions: 2048x858, Tile size: 256x215

  • e7f3424 = 13:44 minutes
  • 2.69 = 13:25 minutes

Apparently is a TITAN specific issue.

Brecht Van Lommel (brecht) renamed this task from Cycles GPU Performance Regression to Cycles GPU Performance Regression on Titan.

Maybe this is the same issue as here? https://developer.blender.org/T39089

Do you have any nodes connected to the Volume Output in those files? If so, remove the link.

@Brecht Van Lommel (brecht), i would ask you to keep track of this report. Currently it looks like being abandoned.