Page MenuHome

Cycles crasches on bigger blends using a Nvidia Titan
Closed, ResolvedPublic

Description

i7 3930 3.2 Ghz
16GB RAM
120Gb HDD

1x Nvidia 580
1x Nvidia Titan
Driver 320.18

Blender Version

9855_blender-2.67---57284---unified-contrib-add-ons---fastest---cuda

I am using the Titan for Cycles excluding the 580 i have from use by cycles.
This seems to work fine for smal blends.
But when i try to render blends that use more than 1,5Gb of vram cycles shows a error message and a black screen with no result.
The error message is only shown verry breefly to short to read it.

It looks like cycles is having problems adressing the extra memory on the titan.
But i can not find any extra info to substantiate that.

Is it posible to find the error messages cycles shows in a log, which might shed some more light on things?

Also the error message "display driver stopped responding" seems to show up a lot.







--- Steps for others to reproduce the error (preferably based on attached .blend file) ---


Event Timeline

Hi,
first of all please note, that we do not support the titan officially yet. For main releases, we still use the Toolkit 4.2, which has no Titan support.
There is no restriction to 1.5 GB in Cycles, many people use Cycles with GTX 580 cards and 3GB RAM.

You could try to increase the Display Driver timeout: http://artificialflight.org/blog/2013/cycles-crash-cuda-tdr-error/

On second thought, this could also be related to report #35665.

Jacques, please check the console (Window -> Toggle System Console) and copy the error message from there.

Jaques, please try a build newer than 57317.

Hi Thomas,

Indeed i am getting a error message...


<!> event has invalid window
write exr tmp file, 1280x720, C:\Users\jacques\AppData\Local\Temp\dedam4.blend_S
cene_de dam.exr
read exr tmp file: C:\Users\jacques\AppData\Local\Temp\dedam4.blend_Scene_de dam
.exr
write exr tmp file, 1280x720, C:\Users\jacques\AppData\Local\Temp\dedam4.blend_S
cene_de dam.exr
CUDA error: Invalid value in cuTexRefSetAddress(NULL, texref, cuda_device_ptr(me
m.device_pointer), size)

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

read exr tmp file: C:\Users\jacques\AppData\Local\Temp\dedam4.blend_Scene_de dam
.exr

I am using the latest build with i can find with cuda 3.5 support

I don't have a Titan to test, but from the error log I think we're running into 1D texture limits. The memory is getting bigger but the texture limits seem to have stayed the same. Probably that means we have to switch to using regular arrays to take full advantage of 6GB of memory, but this will require major kernel refactoring.

Hi Brecht,

Can we check this some otherway, like removing textures or something?

I could setup a session with teamviewer, so you can have a look and test things on the Titan Machine yourself.

PS,

Why does the same thing not go wrong with the Tesla Cards the blender foundation uses.
They have a lot more RAM as well.

I would suggest to first define a reference test .blend and add it here to the report.
That way people with various configurations can test and feedback.

It's also best to exclude all other features, like compositing, exr buffer saving, 'free images' on render, etc.
And think of a way to do test renders that allow to render fine (using 1.5 GB apparently) and how to make it fail (like reveal another layer in scene).

Make it as simple as possible!

I made a simpel blend with arrays of suzans.
As i increase the number of suzans the memory goes up.
When using the GTX 580 Cycles cuts out at 1100mb.
When using the Titan Cycles cuts out at 1421mb.

So its using more RAM but no where near as much as present on the Titan.

Are we sure the amount of RAM indicated by Cycles is correct?

When cycles cuts out i get this message.

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

<!> event has invalid window
CUDA error: Out of memory in cuMemAlloc(&device_pointer, size)

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

CUDA error: Invalid value in cuTexRefSetAddress(NULL, texref, cuda_device_ptr(me
m.device_pointer), size)
CUDA error: Out of memory in cuMemAlloc(&device_pointer, size)

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

CUDA error: Out of memory in cuMemAlloc(&device_pointer, size)
CUDA error: Out of memory in cuMemAlloc(&device_pointer, size)
CUDA error: Out of memory in cuLaunchGrid(cuPathTrace, xblocks, yblocks)

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

<!> event has invalid window
CUDA error: Invalid value in cuTexRefSetAddress(NULL, texref, cuda_device_ptr(me
m.device_pointer), size)

Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering

This is the last setting that works with the Titan, any more suzans and it fails.

Dear all, thank you for your efforts so far!

I can confirm that there are problems with a Titan card. I first thought that it had something to do with textures because I got the error message "Invalid value in cuTexRefSetAddress(NULL, texref, cuda_device_ptr(mem.device_pointer), size)". That happened with a bigger scene and the used memory was ~3.45 GB at the time the error occured - concerning GPU-Z.

To make sure there is no error with the card's RAM I ran plenty of denoising jobs from the CUDA sample browser coming with CUDA Toolkit 5 SK in parallel. And I was able to fill the vRam up top 5.8 GB.

To further investigate this I created a simple scene with nothing but planes. I didn't use any textures and slowly increased the vertex count of the scene by adding ocean modifiers. I changed the resolution and added gradually more and more ocean tiles. I wasn't able to use more than 2800 MB of vRam until Cycles quit.

I repeated this last test today with r57553M from builder.blender.org - with the same results. (Had to rename kernel_sm_30.cubin into kernel_sm_35.cubin to be able to render at all.) Today I wasn't able to get above ~2000 MB (!?). Perhaps something else was occupying other parts of the vRam last time I did the test.

Interesting for me is, that there is no texture in the whole scene and nevertheless i get the above mentioned error. According to the nVidia CUDA Toolkit ReferenceManual (version 4.2) the purpose of that method is: "Binds a linear address range to the texture reference hTexRef." But without a texture?

I uploaded my little test scene to pasteall ( http://www.pasteall.org/blend/22199 ) it's 516 KB.

If I can assist with any additional tests please let me know - and thank you for your efforts and your formidable work on Blender!

We use CUDA textures not only for image textures, but also for storing other data like geometry, so that explains why it can give this error.

I'm convinced this is a texture limit we're running into. We will investigate avoiding them, and see what the performance is like. But it's too big a change to solve this for the next release, too much risk breaking things, so a fix will have to wait a few weeks.

Thanks for clarifying this, Brecht.

If you use the textures also for storing other data that's indeed most probably a texture limit.

I'm as far from being an expert in that as one can be. But the Titan has a Kepler architecture and I read that certain limitations doesn't exist any more with this architecture. One of the more important new features is perhaps bindless textures which gives more freedom than with Fermi cards. Are there plans to differentiate between Fermi and Kepler cards in Cycles?

We might take advantage of new features on Kepler, bindless textures could be useful to avoid the number of image textures limit, but it wouldn't solve this particular issues about maximum texture size.

Duplicate report:
http://projects.blender.org/tracker/index.php?func=detail&aid=35985&group_id=9&atid=498

Should we test for this bug in the new version 2.68 , or is the situation unchanged?

Hi!

I'm testing the Titan Card with a scene that takes 2929.4 Mb of memory without any problem.

Is there any scene to test this bug?

I'm using this build of DingTo branch: http://www.graphicall.org/1049
GSoC-2013-DingTo_Win7.x64-58259

I have experimental changes in my branch, which allow more texture usage, but I am not sure if this is related.

I tried build 58259 it seems to be better even on a 580, but when you increase the MEM it still cuts out at 1,5Gb or so.

So it seems build 58259 allows you to use a bit more Vram but not a lot.

For the technically inclined, I've attached a patch which might solve the issue on NVidia 6xx cards and up. However it also comes with a performance penalty, at least on the NVidia 650M card I'm using here, some scenes are rendering 2%-15% slower with this, so that needs to be solved still.

Hi Brecht,

Thanks for the patch.

I am assuming this C++ file needs to replace a file in the current sourcecode?

The speed of the 6xx cards in blender is not as straight forward as one would think.
I also tested the 680 card but found it to be slower than the 580 card, and settled for two 580 cards.
Two 580 cards are in fact not twice as fast as one but more like 40% faster.
Which seems strange too.
In blender the Titan seems to be about 5-10% faster than a single 580.
So basically it takes 5 7xx cuda cores to do the same work as one 5xx cuda core.

I can’t wait to find out what this version does, it might as well be faster.

I could try and benchmark both the Titan and the 580 using Octane, that might provide some comparison between the two architectures.
You would have to see the same differences.

Is someone going to compile this patch into a Windows 32/64 executable?

I'm not talking about performance difference between graphics cards, that's a different issue and not something I investigate now, rather the difference between Cycles with and without this patch on the same card.

I know, ones I can get a hold of a windows build including the patch I will give it a go.

All things considered I think it will take me a while to setup Visual C++ and the blender code and compile my own version of blender.


Is this patch included in the builds on graphicall.org?
If so as of which build nummer?

I don't think there's any build with this patch on graphicall.

I did some more testing but so far couldn't improve performance of the patch yet. Also found out that textures are apparently faster than global memory for BVH traversal on this architecture, which is unfortunate because we might then have to choose between performance and memory.
https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum

Hi Brecht,

I tried the Titan on Octane render too, all memory is usable on Octane.
To bad you can not check with them how they did that.

I wil try to setup the sourcecode for blender so i can compile your patch.

Here is a testbuild for Windows x64:
http://temp.dingto.org/trunk_r60396_win64_titan_patch.zip

Please test two things:
1) Does it solve the usage of 6GB?
2) Is there a performance difference to a regular build?

I forgot to mention, that build is compiled with Brechts patch^^.

Hi Thomas,

I am giving it a go right now, looks good sofar

Hello Thomas,

This version works on the Titan i can use up to 5750MB of RAM.

Fix committed to svn now. From reports on IRC, this patch actually seems to work and render slightly faster than before on Titan. Thanks all for testing.

Brecht Van Lommel (brecht) changed the task status from Unknown Status to Resolved.Sep 27 2013, 9:37 PM

Hi

The speed on Titan seems to be the same as in the stable 2.68a.
GTX 580 seems to be a bit faster on this version.

works for me. Was able to render 5120mb, didn't try any further. Also mike pan's benchmark went down from 24 secs to 22 at 512x256 tile size. Amazing brecht! thanks!

Hi,

I did some more testing, and found that when you use more memory the actual speed difference with for instance CPU goes down.
I have a scene which i already renderend on CPU which uses 4820MB on CPU that took 16 minutes.
Using this patch on one Titan it takes 6 minutes.

Basicly 3 times faster, but on smal files the difference is more like 12 times.
Seems odd.

Massive improvement though.

Is this patch going to be included in 2.69?

Awesome! work Brecht thanks.

And thanks to Thomas Dinges for compiling the Win64 exe ;-)

There's various reasons why small and big files may be different, but it's pretty much expected and in line with what I've seen. Warp divergence on the GPU and larger cache size per thread on the CPU means that the CPU tends to do comparatively better on more complex scenes.

Also, yes this patch will be in 2.69.

i've Tested this Build (http://temp.dingto.org/trunk_r60396_win64_titan_patch.zip) with an Nvidia 770 4GB

And...It works!, i can render 3900MB scene.

Thanks a lot.

Hi, Brecht! What is the path and file as it is called, in which you want to insert this program code F24231: cuda_titan_use_global_memory_arrays.patch?

@Eugen (djessclub): No need for that, the patch has been commited and is in the 2.69 release and newer.

My video card is "Titan" and I need to start compiling, how to do it better?

Again you do *not* need to compile anything, just get Blender 2.69 or the 2.70 RC from blender.org...

Hi,

I was just wondering if this patch also worked for the new Titan-Z that has 12GB of memory?

Kind regards

Jacques van Dijk