Page MenuHome

GPU memory explode since version 2.73
Closed, InvalidPublic

Description

System Information
Windows 7 x64
I7 860 16gb ram
2x Geforce GTX 670 4 gb vram
Driver 344.11
NO SLI ENABLE

Blender Version
Broken: Blender 2.73 Hash : b4d8fb5
Worked: (optional)

Short description of error
Since the 2.73 RC and for the current, the gpu memory grows up really fast. For 142368 faces, with a diffuse material with a grey color in world, the memory grows to 1.225 Gb vram. The same scene in Octane render blender version 2.72, last version only take 120 mb of vram. I know that Octane have a good geometry compression process but here it's 10.2 time bigger than Octane. Before just be 1.25, 1.65 ratio between the two render engine. Also, scenes builded with previous version seems to be not affected by this problem, don't know why.

Exact steps for others to reproduce the error
Computing a scene in gpu mode under 2.73

Event Timeline

matthieu barbie (MattRM) raised the priority of this task from to 90.
matthieu barbie (MattRM) updated the task description. (Show Details)
matthieu barbie (MattRM) edited a custom field.

@matthieu barbie (MattRM) are you talking about Cycles GPU rendering or the viewport? Since you talk about Octane it's a little confusing...

@Kévin Dietrich (kevindietrich) I talk about Cycles GPU in viewport and in GPU batch render. Octane was just a point of comparison. Sorry for the little confusing. Don't know if is due to the cuda vs nvidia drivers or memory leaking in vram or other.

Matt

Sergey Sharybin (sergey) lowered the priority of this task from 90 to 30.Jan 19 2015, 8:30 AM

Please always attach files which demonstrates the issue, we're asking it in the template for a reason.

So far i can not confirm the issue here.

Hi,

The file in attachment. Please do not publish it somewhere, it's a part of a commercial project. Thanks.

Matt

Few things:

  • This is public tracker, so uploading commercial files might not be the best idea
  • The file misses textures, which affects on memory consumption
  • I can not see any difference between 2.72, 2.73 and current buildbot in terms of memory usage (it's all at 27MB for me).

Hence the questions:

  • Are you talking about memory usage reported by blender itself (which shows amount of memory it requested from driver) or from driver settings (which gives you number of allocated memory, which might be much higher because of fragmentation, memory used by kernel etc) ?
  • Can you somehow provide file with all the textures and linked data in it for tests?

Hi,

In attachment the files with textures.

I talk about the memory can be used by the card. I used NVIDIA Inspector for bench the memory, cause it returned the real memory used on the card (Octane returned to me the same memory usage, so Octane match this tool). Of course, when NVIDIA Inspector hit 4 Gb vram, Blender send me an error about memory error but in Blender the memory used is really lower than 4 gb.

At start, my card use by default 430 mb of vram, with windows + the current scene opens. Hit 1092 Mb using layer, without layer hit 1225 Mb (all objects in one layer). So use in reality 1092-430 = 662 mb in vram. Octane, same scene, in Blender for Octane, use only 86.2 mb.

Maybe is cuda that do this, don't know. I know that more and more features will impact a few, normal, but I talk with an other blender user that saw lower performances than before on gpu and memory higher. Also, the main memory explode too during the BSP and co processes before send datas to gpu. For a big scene, I understand but for just this scene, quite strange. I know that it's normal that the main memory grows up before sends datas to gpu.

It's just it's quite strange and just to push Blender higher. GPU is really great for short deadline ;)

Best Regards,

Matt

The file : https://drive.google.com/file/d/0B2ylYG5fzOLLZy0yZnZPd0tBNkU/view?usp=sharing

What is the memory consumption reported by Blender? That's the only figure which actually matters since we currently don't spend time trying to optimize inefficiency of memory utilization caused by high memory fragmentation and megakernel (those things are not directly under our control and totally depends on driver/cuda toolkit used to compile kernel/etc).

In a fresh blender the memory peak is 443.56 Mb after rendering.

If you want to have more infos about my driver version or others, just ask me. Oki, I understand for the CUDA and why memory grows up faster on GPU.

Matt

For me it is 218meg in all the builds (2.72, 2.73, latest buildbot).

The question is: do you have *different* memory usage reported by *blender* in the builds mentioned above?

Yes,

In Blender 2.72 lower, 204.60 mb, peak 217.88 mb.

In Blender buildbot blender-2.73-ffe5653-win64, peak : 443.56 mb
In Blender buildbot Gooseberry gooseberry-blender-2.73-6e5b21f-win64, peak : 443.56 mb

I use a regular windows 7 x64 (with last updates, no cracks at all), the last driver for my GPU.
That's why I found this strange ;)

Matt

This is strange indeed. Some extra questions:

  • Does it also happen with CPU rendering?
  • Does it happen after loading factory settings (could be some addons interfering)
  • Any chance you can test other machine to see if you can reproduce the issue with the same exact file you've shared here on a different environment?

I tested the file under Ubuntu 14.04 64bits with a NVIDIA Titan Black using NVIDIA Driver 340.65.

Result with 2.73 hash 4c74fb2
204.60M, peak 218.63M

With my CPU i7 4770K
205.50M, peak 235.90M

@ronan ducluzeau (zeauro) : I run Blender under windows x64, so between system we can have different result as Linux not drives the memory with same way and gpu too. In CPU mode on our production machine under linux, mentalray run 1.35 faster than windows and use 12% less memory. it's important to run the test under a windows environment to compare really the bench.

Also, the build is not the same as C++ level (gcc / microsoft compilers).

@Sergey Sharybin (sergey) : So I test, here the results :

Blender 2.73 CPU : mem peak 228.31 mb
Blender 2.72 CPU : mem peak 228.31 mb

So same result, only a difference in GPU on my config and the allocated memory for cpu between Blender and the system (using task manager) match at 100%

Tried on an other computer in GPU (one gpu only) and here the result :
Blender 2.72 GPU : mem peak 229.06 mb
Blender 2.73 GPU : mem peak 229.06 mb

So I tried using only one card on my computer and the result with one card only was 229.06 mb too. So problem is when you use two cards at the same time, like this

instead of this

The problem appears only in multi GPU mode, like the datas were double in the primary card for the secondary card instead of loading the data for the second card in the memory of the second card. So a system using 4 cards (exemple), quadruple the memory used so divide the memory for the scene by 4.
Is it not possible to instance the datas for one card to the second card to avoid this behavior ?

Screenshot for the CPU tests here :

@Sergey Sharybin (sergey) could this be caused my the patch that i believe @Bastien Montagne (mont29) made concerning multiple GPU rendering?

@matthieu barbie (MattRM), Ah didn't pay much attention it's 2 GPU which are used. So the counter in blender seems to be useless in this case and should be fixed (it counts total memory consumption by all the cards, and since scene is to be allocated on both GPUs in this case it gives2x of memory consumption reported by blender). this is to be fixed.

@ronan ducluzeau (zeauro), in any case, that's how blender always reported memory usage. And it still remains a mystery whether memory consumption reported by blender changes dramatically (on your desktop with win and gtx670) from 2.72 to latest buildbot with the file you've attached here.

Just to be clear: if it's memory usage which is just higher in blender than in other renderers -- that's not a bug, but if i understand you correct and 2.72 indeed used close to order of magnitude less memory than current buildbot then it's something to investigate.

Addition:

@Aaron Carlisle (Blendify), That's weird to try making blind guess here, someone need to become able to see the difference in memory consumption reported by blender.

@Sergey Sharybin (sergey) : oki, but in nvidia inspector, only one card is charged, the other remain at zero in multi gpu. In mono gpu, it's always the primary card that get the charged datas, if I select the primary card as compute system, the memory is charged for this card. If I select the second card, it's still the primary card that is charged. So no datas goes to the secondary card, including if it's this one need to be load.

So what do you think about this behavior ? If you need more datas, still here ;) I'm quite sure, there're still have something strange or misunderstood from my side.

Matt

@matthieu barbie (MattRM), Again, memory report in Cycles is to become aware of how exactly cards are situated (if it's two dedicated cards, two GPUs on the same board etc).

What's really matters, is whether behavior was different in the past. The report claims it was different in 2.72. That i want someone to be able to reproduce and confirm. If it's confirmed then we'll look into what exactly caused the change and try to solve it. If behavior never changed the report goes to archive and improvements happens as a part of regular development.

It is always possible to find stuff for improving, but please, stay on topic of the report and don't mix all sort of stuff in here.

My report it's for improving to get the memory / performances from 2.72 in multi GPU mode, only that. With the same system, I have differences in real hardware memory used by Cycles between 2.72 and 2.73, more memory used in 2.73 including in the blender stats, in multi gpu mode. it's the goal of this topic, found why and it seems to be in multi gpu mode only.

I not mixed any other stuff here, I just report that in 2.72, memory was lower in multi gpu mode than in 2.73. To help you I also benchmark my hardware to saw how the cards worked. And during this benchmark, I saw only one card is used for memory allocation and it's part of the problem in multi GPU mode, cause double the datas, so divide the max datas you can use in gpu. And I think, using only the memory of the primary card can be a problem if the secondary card have more memory.

Now I think I give you all the tests I can do for this problem, memory report from blender, report from the hardware, test on an other computer. If you need more, you can ask me but if multi-GPU is not the priority can understand too. So the only one things that is true, is in 2.72 in multi-gpu, the memory used was 2 times lower. For the confirm, upper in the topic, screenshots shows the 2.72 performance with a peak at 277.88 mb and in 2.73 is 443.56 mb in multi gpu mode.

Now, waiting the confirmation from others people is great to be sure.

Best Regards,

Matt

ps : more infos, my two cards are identical, 2x Geforce GTX 670 4gb vram from zotac, maybe can impact.

Ok see now. It's quite challenging to get the needed info from all the individual comments. So let's assume blender's memory report is correct (i'll fix it a bit later, so for more accurate info just divide it by half when using dual GPU) and let's stick to the hardware memory difference you've got.

The most confusing thing here is that 2.72 and 2.73 (and latest buildbot) should give approximately the same memory reports in blender. The code there didn't change for ages and i can not confirm different behavior here actually. So stupid question: you sure you're rendering on both GPUs in 2.72?

As for your second paragraph about primary card memory -- this i can not see here (scene is copied to bot of the cards i've got), And i'm not even sure how your observation is possible. Without SLI bridge cards can not access memory of each others and Cycles is allocating same memory on each of the devices. I'm using nvidia-settings here, not sure it exists on windows. Might worth testing if GPU-Z gives the same observation about memory used from one GPU only.

Hi,

so here my answers :

The most confusing thing here is that 2.72 and 2.73 (and latest buildbot) should give approximately the same memory reports in blender. The code there didn't change for ages and i can not confirm different behavior here actually. So stupid question: you sure you're rendering on both GPUs in 2.72?

Yes, two buckets working and the nvidia inspector show me two cpu working in the 2.72

As for your second paragraph about primary card memory -- this i can not see here (scene is copied to bot of the cards i've got), And i'm not even sure how your observation is possible. Without SLI bridge cards can not access memory of each others and Cycles is allocating same memory on each of the devices. I'm using nvidia-settings here, not sure it exists on windows. Might worth testing if GPU-Z gives the same observation about memory used from one GPU only.

So, I have a SLI bridge (the hardware part in my desktop) but in the config panel the SLI is disable. I tried without the SLI bridge (hardware), nothing change. With SLI enable, only one cards is loaded, as before, but the memory is quadruple in hardware (using GPU-Z and nvidia inspector, same result from the two).

I compare with Octane for Blender to have the return of the memory used (this version is based to on blender 2.72), and it return peak mem : 352.14 mb (hardware return 89 mb used) in two gpu in used
With only one gpu, the memory used is peak mem in blender : 273.16 mb (hardware return 89 mb used in this case too)

My comparaison with octane is just to benchmark the memory only, not perfs (not the same processes, one is bucketed, the other not). So in this case, you right, blender not returned the real mem use during the rendering as you said before. So something, in my config under windows x64, make this behavior of memory using two cards. the real memory used for Blender 2.73 is higher from datas gets from hardware than 2.72 and same in Blender stats.

I can try to run a live opensuse x64 or Ubuntu live and run blender from an usb key to see if it's a cuda behavior under windows. More looks like to this now.

What do you think about this ?

Matt

Please leave comparisons with octane aside, it just adds unnecessary complexity on linking all pieces of puzzle together.

Thoughts are:

  • I'll fix Blender's memory report in next few days, so it'll report meaningful values when using multiple GPUs.
  • Actual hardware memory usage might be indeed higher in 2.73 because of new cuda toolkit, but it sohuld some percents or couple of dozen of percent, not times more memory. Missing puzzle here is hardware memory comparison of 2.72 and buildbot when using *single* non-display GPU (crucial words are: single and non-display).
  • Someone need to test if he can reproduce weirdeness of memory report by blender 2.72 and current buildbot. The figures for GPU should be roughly the same there. @Thomas Dinges (dingto), @Martijn Berger (juicyfruit), do you have chance to test that?

@Sergey Sharybin (sergey) Also as cycles grows in complexity and depending on the exact device we incur and overhead of sizeof(stackframe) * max_stackframe_len * launched_number_of_threads

With experimental kernel I have had this static cost be over 1100 Mb.. That is default cube .. bvh of size ~ 25kb and no one default material...

@Martijn Berger (juicyfruit), yes that's true. but it still not clear for me why memory report made by blender would be different in 2.73 and latest git. If some developer can test that it'll be really handy,

Aaron Carlisle (Blendify) raised the priority of this task from 30 to Normal.Jan 24 2015, 3:33 PM

This should be kept as a normal so that it does not get lost in the incomplete bugs

I have the same problem.
Win7, GTX 580 3Gb Vram, Blender 2.73a

Just default startup scene.

GPU-Z give me a memory load about less than 300mb

Press render (or viewport rendering) and the memory load grow to 1200mb while blender viewport report 32mb.

The incongruence should be not a problem but actually when the GPU-Z go over 3Gb, blender failed to render so it seems that the number provided by GPU-Z (or any other GPU memory inspector) is correct

Hi all,

Ok I know was long time. Just tested the last build (with stereoscopic support, just amazing) and found when the memory explode a lot. In Experimental mode... just double the memory. In normal mode gpu memory is normal as Blender report (just a little difference like 120Mb). I know, it's experimental mode (like support of SSS) but maybe is in the code for SSS or something like that.

So only explode, well double, when have "experimental mode". I'll upgrade my computer soon too a I7 6 cores, but found a way to reduce this "double memory problem" will be great. Fast SSS rendering or hairs is still great option in gpu ;)

Matt

Sergey Sharybin (sergey) changed the task status from Unknown Status to Unknown Status.Apr 14 2015, 3:02 PM
Sergey Sharybin (sergey) claimed this task.

There are some features enabled on GPU since 2.70 (like volumes, SSS in experimental kernel and so on). those features are not free for being enabled, because they affect on the stack size, registers pressure and so on. There were also some optimizations which reduced number of textures lookup (which is relatively expensive) but also increased requirement for the stack size.

In order to solve the memory usage we need to either disable features back (which is not really great idea) or to finish the kernel split patch and make it working for CUDA as well (see D1200), but that's not quite belong to th bug tracker.

So thanks for the report, we're well aware of such memory changes and moving towards solving them, but it's not considered a bug.

Thomas Dinges (dingto) changed the task status from Unknown Status to Invalid.Apr 14 2015, 3:04 PM

GPU experimental kernel uses more memory due to SSS and CMJ being enabled for that one. This is to be expected and not considered a bug.

Should I open a new bug report if this also happens with supported kernel when rendering the MikehPan BMW scene?
It renders on the GPU in 2.73, fails in 2.74 (out of memory) with an Nvidia GTX560 (1GB)

My gtx580 has 3gb and only 1 cube alocates 890mb in vram....

There is an sugestion that new options in cycles are more memory consumptioning.
Then explain to me why same scene on same computer with 2 gpus alocates 4 times more mb on gtx580 then on second newer card.

This is for sure related to fermi or other hardware thing. Unfortunately my thread was merged with this one... But for me it sounds quite different problem... My is related to gtx5 series...

As Sergey said, the only real solution is the Kernel Split patch (adapted for CUDA), so this is a ToDo thing which we'd like to address. No need to submit a new bug now, we know about this problem. We will see when this will be fixed.

Hello.
I use Linux 64 bits. Blender 2.75a. GTX 960 4GB
When BMW27.blend scene opened in Blender, nvidia-setings report vRAM=551MB usage. When start rendering Blender shows Mem/Peack=139.68M, but nvidia-settings shows 1369MB. From what I've read above from "Sergey": (those things are not directly under our control and totally depends on driver/cuda toolkit used to compile kernel/etc)

I don't understand if this is a Blender problem or an nVidia problem (or both). Should this problem be reported to nVidia?

Nvidia is aware of this and provides a way to query this ram but only for pro level cards ( read quadro ) .

We cannot really get much better at this from the blender side. And besides blender there are many other possible users of your GPU's memory like your operating system if you have a monitor connected to the card.

For my case the minimal allocation seems to be 6mb with no monitor allocated and around 600 mb with a monitor driving a 1920x1200 screen.
If you have a 1.5 Gb or 2Gb card that leaves not much to work with. ( just starting cycles can take another 400-800 mb depending on card and other things)

I ve done multiple render test's with the basic cube scene as far as v2.66 to the latest v2.76 and noticed with GPU-Z that with each new release Cycles uses more and more VRAM then what it actually reports inside Blender while rendering.

Test Setup:

  • Default Blender CUBE scene, 960x540 Resolution, 1000 samples, 512x512 Tiles
  • Primary Display GPU: Quadro K2200 (1x24" LCD)
  • Rendering GPU: GTX 780 Ti (no monitor attached)
  • Windows 10 Pro, 16GB DDR3
  • GPU-Z to monitor Memory Usage(Dedicated)

VRAM Usage Results:

Blender v2.76 - v2.72:

GTX 780Ti: 1,150MB
QUADRO K2200: 378MB
CPU: 3MB (Task Manager)

Blender v2.71:

GTX780 Ti: 590MB (50% reduction)
QUADRO K2200: 200MB

Comparison:

Blander Octane Edition v2.23.2 DEMO (Blender v2.75)
GTX780 Ti: DL 248MB, PT 263MB, PMC 311MB

Conclusion:

Cycles in Blender v2.72 to v2.76 use twice as much VRAM as v2.71 and older versions. Just going from v2.71 to v2.72 you lose close to 50% VRAM. For comparison when rendering with Octane Render it only used around 250-300MB, that's 4-5x less than Cycles 2.72-2.76 and 2x less than Cycles 2.71. When rendered on CPU memory usage for Blender process in Task Manager went up by only 3MB. Where is the problem? There is definitely something wrong in the cycles/blender code.

Interesting thing is that the Quadro K2200 with 24"LCD attached only used 378MB of VRAM compared to 1150MB for the GTX 780Ti. So my guess is there is something wrong with the CUDA Kernel that uses older compute capability. Quadro K2200 Compute Capability is 5.0 where as GTX780Ti is 3.5. Hope this post helps the developers to address this issue in the future build.

Cheers!

The memory increase in 2.72 happened because that release added volume rendering on the GPU.

Regarding the difference between Quadro and GTX, the memory usage varies with the number of cores, so that is as expected since your GTX card has about 4x more cores than your Quadro card.

So by adding volume rendering on GPU it uses up twice as much VRAM? Still a simple cube like this should not take up 1/3rd of VRAM. CPU rendering just takes less than 3MB of system RAM to render the same scene but 1,150GB VRAM on GTX780Ti seems a bit extreme dont you agree? Octane just uses under 300MB for comparison, 4x less than Cycles in Blender v2.72-v2.76 to render the same scene. Even with the Quadro K2200 378MB is high. Seems like there is a lot of VRAM being wasted somewhere and the code still has room for performance optimizations.

We all agree the memory usage is not optimal, I'm just pointing out that the causes are known, and as explained above the solutions are known too, they just take a lot of time to implement.

You can't really compare it to a CPU, the whole issue here is the per core memory usage, and a CPU might have e.g. 8 cores, while your GPU has 2880 cores.

I just rendered this bench test scene BMW27.blend (http://blenderartists.org/forum/showthread.php?239480-2-7x-Cycles-benchmark-%28Updated-BMW%29&highlight=2.7+benchmark), on v2.71 and v2.76, got identical times(1:27) in both versions but v2.76 used 1300MB VRAM and v2.71 used 800MB. You said this is because volume rendering has been added to v2.72 and up but both images look identical so why is it eating up 500MB more VRAM for some feature that I am not even using?

Because for technical reasons it is currently not possible to create separate CUDA kernels for every combination of features that you might use in a scene, and even if you do not use a feature there is still a cost as GPUs require the stack size to be fixed per kernel. The split kernel mentioned above would address this.

Is there a working beta of the split kernel? ETA?

Work on that has not started yet and there's no ETA currently.