Page MenuHome

Massive loss of speed in new builds with Ambient Occlusion and Branched Path Tracing.
Closed, ResolvedPublic

Description

Win 7 64
GTX 750TI

Old Build:
Hash:5af103f
14.01.2016
Branched Path Tracing: 01:13:90

New Build:
Hash: f77110e
08.02.2016
Branched Path Tracing: 01:41:69

Testfile BMW1M-MikePan_AO
http://www.pasteall.org/blend/40515

Details

Type
Bug

Event Timeline

Hans Nolte (ditos) updated the task description. (Show Details)
Hans Nolte (ditos) raised the priority of this task from to Needs Triage by Developer.
Hans Nolte (ditos) set Type to Bug.
Sergey Sharybin (sergey) triaged this task as Needs Information from User priority.Feb 8 2016, 1:55 PM

Please always follow bug report guidelines and provide all the information requested in there and all remaining information which is needed to understand what you're doing, what's exact settings etc you're using. We can't always deduct such things and providing them in the original report will save time ;)

For example here the biggest question is whether it's CPU or GPU rendering?

Hi Sergey,

many thanks for the fast reply.

It's GPU-Rendering.

All settings can be found in the Testfile.

Ok, will do investigation later.

Sergey Sharybin (sergey) raised the priority of this task from Needs Information from User to Normal.Feb 8 2016, 2:19 PM

Likely the Experimental kernel removal.

@Thomas Dinges (dingto), Yes, hopefulyl just matter of tweaking some inline policies again for the branched path tracer (similar to what we did for the regular one) to ger rid of extra spill loads.

For what it's worth, I can see the speed loss with the official builds. But using CUDA toolkit 7.5 to build 2.76 and latest master, render time for both is roughly the same as the official 2.76 release.

@Brecht Van Lommel (brecht), i was looking into the issue and it's indeed just much-much higher register spills of the branched path tracing. Had a patch which brings spilling down a bit, but didn't find a chance to test it because we don't have sm_50 card here. Will try to reproduce the issue on sm_52 tho (time difference on my sm_20 card is quite low, few percent only).

Are you suggesting to switch to cuda toolkit 7.5? In my experience it was still noticeable slower on barcelona and classroom scenes.

Addition: measured the barcelona file again, it's 5% slower (comparing 2.76b official release with current master+toolkit-7.5),

@Brecht Van Lommel (brecht), on another hand, having 5% speed loss on an older card is better than 30% loss on a brand new cards. So perhaps it's easier to just update a toolkit.

Sergey Sharybin (sergey) closed this task as Resolved.Feb 17 2016, 3:42 PM

Ok, we've been doing benchmarks here in the studio, and also had help from the community in IRC. We didn't see speed loss more than 5% with a new toolkit which is kinda what we can accept, so we're going toolkit 7.5 for the official builds now. Dealing with that speed regression is easier than with that major one reported in this task anyway. Don't think we'll be able to address this issue before the 2.77 tho.

Linux and Windows builds will be ready soon. OSX will be done later.