GPU memory explode since version 2.73 #43310
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
15 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#43310
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Windows 7 x64
I7 860 16gb ram
2x Geforce GTX 670 4 gb vram
Driver 344.11
NO SLI ENABLE
Blender Version
Broken: Blender 2.73 Hash :
b4d8fb5
Worked: (optional)
Short description of error
Since the 2.73 RC and for the current, the gpu memory grows up really fast. For 142368 faces, with a diffuse material with a grey color in world, the memory grows to 1.225 Gb vram. The same scene in Octane render blender version 2.72, last version only take 120 mb of vram. I know that Octane have a good geometry compression process but here it's 10.2 time bigger than Octane. Before just be 1.25, 1.65 ratio between the two render engine. Also, scenes builded with previous version seems to be not affected by this problem, don't know why.
Exact steps for others to reproduce the error
Computing a scene in gpu mode under 2.73
Changed status to: 'Open'
Added subscriber: @matthieubarbie
#44139 was marked as duplicate of this issue
Added subscriber: @Blendify
Added subscriber: @kevindietrich
@matthieubarbie are you talking about Cycles GPU rendering or the viewport? Since you talk about Octane it's a little confusing...
@kevindietrich I talk about Cycles GPU in viewport and in GPU batch render. Octane was just a point of comparison. Sorry for the little confusing. Don't know if is due to the cuda vs nvidia drivers or memory leaking in vram or other.
Matt
Added subscriber: @Sergey
Please always attach files which demonstrates the issue, we're asking it in the template for a reason.
So far i can not confirm the issue here.
Hi,
The file in attachment. Please do not publish it somewhere, it's a part of a commercial project. Thanks.
Matt
SEQ01_SH003-v002.blend
Few things:
Hence the questions:
Hi,
In attachment the files with textures.
I talk about the memory can be used by the card. I used NVIDIA Inspector for bench the memory, cause it returned the real memory used on the card (Octane returned to me the same memory usage, so Octane match this tool). Of course, when NVIDIA Inspector hit 4 Gb vram, Blender send me an error about memory error but in Blender the memory used is really lower than 4 gb.
At start, my card use by default 430 mb of vram, with windows + the current scene opens. Hit 1092 Mb using layer, without layer hit 1225 Mb (all objects in one layer). So use in reality 1092-430 = 662 mb in vram. Octane, same scene, in Blender for Octane, use only 86.2 mb.
Maybe is cuda that do this, don't know. I know that more and more features will impact a few, normal, but I talk with an other blender user that saw lower performances than before on gpu and memory higher. Also, the main memory explode too during the BSP and co processes before send datas to gpu. For a big scene, I understand but for just this scene, quite strange. I know that it's normal that the main memory grows up before sends datas to gpu.
It's just it's quite strange and just to push Blender higher. GPU is really great for short deadline ;)
Best Regards,
Matt
The file : https://drive.google.com/file/d/0B2ylYG5fzOLLZy0yZnZPd0tBNkU/view?usp=sharing
What is the memory consumption reported by Blender? That's the only figure which actually matters since we currently don't spend time trying to optimize inefficiency of memory utilization caused by high memory fragmentation and megakernel (those things are not directly under our control and totally depends on driver/cuda toolkit used to compile kernel/etc).
In a fresh blender the memory peak is 443.56 Mb after rendering.
If you want to have more infos about my driver version or others, just ask me. Oki, I understand for the CUDA and why memory grows up faster on GPU.
Matt
For me it is 218meg in all the builds (2.72, 2.73, latest buildbot).
The question is: do you have different memory usage reported by blender in the builds mentioned above?
Yes,
In Blender 2.72 lower, 204.60 mb, peak 217.88 mb.
In Blender buildbot blender-2.73-ffe5653-win64, peak : 443.56 mb
In Blender buildbot Gooseberry gooseberry-blender-2.73-6e5b21f-win64, peak : 443.56 mb
I use a regular windows 7 x64 (with last updates, no cracks at all), the last driver for my GPU.
That's why I found this strange ;)
Matt
This is strange indeed. Some extra questions:
Added subscriber: @zeauro
I tested the file under Ubuntu 14.04 64bits with a NVIDIA Titan Black using NVIDIA Driver 340.65.
Result with 2.73 hash
4c74fb2
204.60M, peak 218.63M
With my CPU i7 4770K
205.50M, peak 235.90M
@zeauro : I run Blender under windows x64, so between system we can have different result as Linux not drives the memory with same way and gpu too. In CPU mode on our production machine under linux, mentalray run 1.35 faster than windows and use 12% less memory. it's important to run the test under a windows environment to compare really the bench.
Also, the build is not the same as C++ level (gcc / microsoft compilers).
@Sergey : So I test, here the results :
Blender 2.73 CPU : mem peak 228.31 mb
Blender 2.72 CPU : mem peak 228.31 mb
So same result, only a difference in GPU on my config and the allocated memory for cpu between Blender and the system (using task manager) match at 100%
Tried on an other computer in GPU (one gpu only) and here the result :
Blender 2.72 GPU : mem peak 229.06 mb
Blender 2.73 GPU : mem peak 229.06 mb
So I tried using only one card on my computer and the result with one card only was 229.06 mb too. So problem is when you use two cards at the same time, like this instead of this
The problem appears only in multi GPU mode, like the datas were double in the primary card for the secondary card instead of loading the data for the second card in the memory of the second card. So a system using 4 cards (exemple), quadruple the memory used so divide the memory for the scene by 4.
Is it not possible to instance the datas for one card to the second card to avoid this behavior ?
Screenshot for the CPU tests here :
Added subscriber: @mont29
@Sergey could this be caused my the patch that i believe @mont29 made concerning multiple GPU rendering?
@matthieubarbie, Ah didn't pay much attention it's 2 GPU which are used. So the counter in blender seems to be useless in this case and should be fixed (it counts total memory consumption by all the cards, and since scene is to be allocated on both GPUs in this case it gives2x of memory consumption reported by blender). this is to be fixed.
@zeauro, in any case, that's how blender always reported memory usage. And it still remains a mystery whether memory consumption reported by blender changes dramatically (on your desktop with win and gtx670) from 2.72 to latest buildbot with the file you've attached here.
Just to be clear: if it's memory usage which is just higher in blender than in other renderers -- that's not a bug, but if i understand you correct and 2.72 indeed used close to order of magnitude less memory than current buildbot then it's something to investigate.
Addition:
@Blendify, That's weird to try making blind guess here, someone need to become able to see the difference in memory consumption reported by blender.
@Sergey : oki, but in nvidia inspector, only one card is charged, the other remain at zero in multi gpu. In mono gpu, it's always the primary card that get the charged datas, if I select the primary card as compute system, the memory is charged for this card. If I select the second card, it's still the primary card that is charged. So no datas goes to the secondary card, including if it's this one need to be load.
So what do you think about this behavior ? If you need more datas, still here ;) I'm quite sure, there're still have something strange or misunderstood from my side.
Matt
@matthieubarbie, Again, memory report in Cycles is to become aware of how exactly cards are situated (if it's two dedicated cards, two GPUs on the same board etc).
What's really matters, is whether behavior was different in the past. The report claims it was different in 2.72. That i want someone to be able to reproduce and confirm. If it's confirmed then we'll look into what exactly caused the change and try to solve it. If behavior never changed the report goes to archive and improvements happens as a part of regular development.
It is always possible to find stuff for improving, but please, stay on topic of the report and don't mix all sort of stuff in here.
My report it's for improving to get the memory / performances from 2.72 in multi GPU mode, only that. With the same system, I have differences in real hardware memory used by Cycles between 2.72 and 2.73, more memory used in 2.73 including in the blender stats, in multi gpu mode. it's the goal of this topic, found why and it seems to be in multi gpu mode only.
I not mixed any other stuff here, I just report that in 2.72, memory was lower in multi gpu mode than in 2.73. To help you I also benchmark my hardware to saw how the cards worked. And during this benchmark, I saw only one card is used for memory allocation and it's part of the problem in multi GPU mode, cause double the datas, so divide the max datas you can use in gpu. And I think, using only the memory of the primary card can be a problem if the secondary card have more memory.
Now I think I give you all the tests I can do for this problem, memory report from blender, report from the hardware, test on an other computer. If you need more, you can ask me but if multi-GPU is not the priority can understand too. So the only one things that is true, is in 2.72 in multi-gpu, the memory used was 2 times lower. For the confirm, upper in the topic, screenshots shows the 2.72 performance with a peak at 277.88 mb and in 2.73 is 443.56 mb in multi gpu mode.
Now, waiting the confirmation from others people is great to be sure.
Best Regards,
Matt
ps : more infos, my two cards are identical, 2x Geforce GTX 670 4gb vram from zotac, maybe can impact.
Ok see now. It's quite challenging to get the needed info from all the individual comments. So let's assume blender's memory report is correct (i'll fix it a bit later, so for more accurate info just divide it by half when using dual GPU) and let's stick to the hardware memory difference you've got.
The most confusing thing here is that 2.72 and 2.73 (and latest buildbot) should give approximately the same memory reports in blender. The code there didn't change for ages and i can not confirm different behavior here actually. So stupid question: you sure you're rendering on both GPUs in 2.72?
As for your second paragraph about primary card memory -- this i can not see here (scene is copied to bot of the cards i've got), And i'm not even sure how your observation is possible. Without SLI bridge cards can not access memory of each others and Cycles is allocating same memory on each of the devices. I'm using nvidia-settings here, not sure it exists on windows. Might worth testing if GPU-Z gives the same observation about memory used from one GPU only.
Hi,
so here my answers :
The most confusing thing here is that 2.72 and 2.73 (and latest buildbot) should give approximately the same memory reports in blender. The code there didn't change for ages and i can not confirm different behavior here actually. So stupid question: you sure you're rendering on both GPUs in 2.72?
Yes, two buckets working and the nvidia inspector show me two cpu working in the 2.72
As for your second paragraph about primary card memory -- this i can not see here (scene is copied to bot of the cards i've got), And i'm not even sure how your observation is possible. Without SLI bridge cards can not access memory of each others and Cycles is allocating same memory on each of the devices. I'm using nvidia-settings here, not sure it exists on windows. Might worth testing if GPU-Z gives the same observation about memory used from one GPU only.
So, I have a SLI bridge (the hardware part in my desktop) but in the config panel the SLI is disable. I tried without the SLI bridge (hardware), nothing change. With SLI enable, only one cards is loaded, as before, but the memory is quadruple in hardware (using GPU-Z and nvidia inspector, same result from the two).
I compare with Octane for Blender to have the return of the memory used (this version is based to on blender 2.72), and it return peak mem : 352.14 mb (hardware return 89 mb used) in two gpu in used
With only one gpu, the memory used is peak mem in blender : 273.16 mb (hardware return 89 mb used in this case too)
My comparaison with octane is just to benchmark the memory only, not perfs (not the same processes, one is bucketed, the other not). So in this case, you right, blender not returned the real mem use during the rendering as you said before. So something, in my config under windows x64, make this behavior of memory using two cards. the real memory used for Blender 2.73 is higher from datas gets from hardware than 2.72 and same in Blender stats.
I can try to run a live opensuse x64 or Ubuntu live and run blender from an usb key to see if it's a cuda behavior under windows. More looks like to this now.
What do you think about this ?
Matt
Added subscribers: @MartijnBerger, @ThomasDinges
Please leave comparisons with octane aside, it just adds unnecessary complexity on linking all pieces of puzzle together.
Thoughts are:
@Sergey Also as cycles grows in complexity and depending on the exact device we incur and overhead of sizeof(stackframe) * max_stackframe_len * launched_number_of_threads
With experimental kernel I have had this static cost be over 1100 Mb.. That is default cube .. bvh of size ~ 25kb and no one default material...
@MartijnBerger, yes that's true. but it still not clear for me why memory report made by blender would be different in 2.73 and latest git. If some developer can test that it'll be really handy,
This should be kept as a normal so that it does not get lost in the incomplete bugs
Added subscriber: @MikeErwin
Added subscriber: @bardo28
I have the same problem.
Win7, GTX 580 3Gb Vram, Blender 2.73a
Just default startup scene.
GPU-Z give me a memory load about less than 300mb
Press render (or viewport rendering) and the memory load grow to 1200mb while blender viewport report 32mb.
The incongruence should be not a problem but actually when the GPU-Z go over 3Gb, blender failed to render so it seems that the number provided by GPU-Z (or any other GPU memory inspector) is correct
Hi all,
Ok I know was long time. Just tested the last build (with stereoscopic support, just amazing) and found when the memory explode a lot. In Experimental mode... just double the memory. In normal mode gpu memory is normal as Blender report (just a little difference like 120Mb). I know, it's experimental mode (like support of SSS) but maybe is in the code for SSS or something like that.
So only explode, well double, when have "experimental mode". I'll upgrade my computer soon too a I7 6 cores, but found a way to reduce this "double memory problem" will be great. Fast SSS rendering or hairs is still great option in gpu ;)
Matt
Added subscribers: @MateuszMielnicki, @JulianEisel, @SenHaerens, @DenisBelov
Changed status from 'Open' to: 'Archived'
There are some features enabled on GPU since 2.70 (like volumes, SSS in experimental kernel and so on). those features are not free for being enabled, because they affect on the stack size, registers pressure and so on. There were also some optimizations which reduced number of textures lookup (which is relatively expensive) but also increased requirement for the stack size.
In order to solve the memory usage we need to either disable features back (which is not really great idea) or to finish the kernel split patch and make it working for CUDA as well (see D1200), but that's not quite belong to th bug tracker.
So thanks for the report, we're well aware of such memory changes and moving towards solving them, but it's not considered a bug.
Changed status from 'Archived' to: 'Archived'
GPU experimental kernel uses more memory due to SSS and CMJ being enabled for that one. This is to be expected and not considered a bug.
Should I open a new bug report if this also happens with supported kernel when rendering the MikehPan BMW scene?
It renders on the GPU in 2.73, fails in 2.74 (out of memory) with an Nvidia GTX560 (1GB)
My gtx580 has 3gb and only 1 cube alocates 890mb in vram....
There is an sugestion that new options in cycles are more memory consumptioning.
Then explain to me whysame sceneon same computer with 2 gpus alocates 4 times more mb on gtx580 then on second newer card.
This is for sure related to fermi or other hardware thing. Unfortunately my thread was merged with this one... But for me it sounds quite different problem... My is related to gtx5 series...
As Sergey said, the only real solution is the Kernel Split patch (adapted for CUDA), so this is a ToDo thing which we'd like to address. No need to submit a new bug now, we know about this problem. We will see when this will be fixed.
Added subscriber: @YAFU
Hello.
I use Linux 64 bits. Blender 2.75a. GTX 960 4GB
When BMW27.blend scene opened in Blender, nvidia-setings report vRAM=551MB usage. When start rendering Blender shows Mem/Peack=139.68M, but nvidia-settings shows 1369MB. From what I've read above from "Sergey": (those things are not directly under our control and totally depends on driver/cuda toolkit used to compile kernel/etc)
I don't understand if this is a Blender problem or an nVidia problem (or both). Should this problem be reported to nVidia?
Nvidia is aware of this and provides a way to query this ram but only for pro level cards ( read quadro ) .
We cannot really get much better at this from the blender side. And besides blender there are many other possible users of your GPU's memory like your operating system if you have a monitor connected to the card.
For my case the minimal allocation seems to be 6mb with no monitor allocated and around 600 mb with a monitor driving a 1920x1200 screen.
If you have a 1.5 Gb or 2Gb card that leaves not much to work with. ( just starting cycles can take another 400-800 mb depending on card and other things)
Added subscriber: @kOsMos
I ve done multiple render test's with the basic cube scene as far as v2.66 to the latest v2.76 and noticed with GPU-Z that with each new release Cycles uses more and more VRAM then what it actually reports inside Blender while rendering.
Test Setup:
VRAM Usage Results:
Blender v2.76 - v2.72:
GTX 780Ti: 1,150MB
QUADRO K2200: 378MB
CPU: 3MB (Task Manager)
Blender v2.71:
GTX780 Ti: 590MB (50% reduction)
QUADRO K2200: 200MB
Comparison:
Blander Octane Edition v2.23.2 DEMO (Blender v2.75)
GTX780 Ti: DL 248MB, PT 263MB, PMC 311MB
Conclusion:
Cycles in Blender v2.72 to v2.76 use twice as much VRAM as v2.71 and older versions. Just going from v2.71 to v2.72 you lose close to 50% VRAM. For comparison when rendering with Octane Render it only used around 250-300MB, that's 4-5x less than Cycles 2.72-2.76 and 2x less than Cycles 2.71. When rendered on CPU memory usage for Blender process in Task Manager went up by only 3MB. Where is the problem? There is definitely something wrong in the cycles/blender code.
Interesting thing is that the Quadro K2200 with 24"LCD attached only used 378MB of VRAM compared to 1150MB for the GTX 780Ti. So my guess is there is something wrong with the CUDA Kernel that uses older compute capability. Quadro K2200 Compute Capability is 5.0 where as GTX780Ti is 3.5. Hope this post helps the developers to address this issue in the future build.
Cheers!
Added subscriber: @brecht
The memory increase in 2.72 happened because that release added volume rendering on the GPU.
Regarding the difference between Quadro and GTX, the memory usage varies with the number of cores, so that is as expected since your GTX card has about 4x more cores than your Quadro card.
So by adding volume rendering on GPU it uses up twice as much VRAM? Still a simple cube like this should not take up 1/3rd of VRAM. CPU rendering just takes less than 3MB of system RAM to render the same scene but 1,150GB VRAM on GTX780Ti seems a bit extreme dont you agree? Octane just uses under 300MB for comparison, 4x less than Cycles in Blender v2.72-v2.76 to render the same scene. Even with the Quadro K2200 378MB is high. Seems like there is a lot of VRAM being wasted somewhere and the code still has room for performance optimizations.
We all agree the memory usage is not optimal, I'm just pointing out that the causes are known, and as explained above the solutions are known too, they just take a lot of time to implement.
You can't really compare it to a CPU, the whole issue here is the per core memory usage, and a CPU might have e.g. 8 cores, while your GPU has 2880 cores.
I just rendered this bench test scene BMW27.blend (http://blenderartists.org/forum/showthread.php?239480-2-7x-Cycles-benchmark-%28Updated-BMW%29&highlight=2.7+benchmark), on v2.71 and v2.76, got identical times(1:27) in both versions but v2.76 used 1300MB VRAM and v2.71 used 800MB. You said this is because volume rendering has been added to v2.72 and up but both images look identical so why is it eating up 500MB more VRAM for some feature that I am not even using?
Because for technical reasons it is currently not possible to create separate CUDA kernels for every combination of features that you might use in a scene, and even if you do not use a feature there is still a cost as GPUs require the stack size to be fixed per kernel. The split kernel mentioned above would address this.
Is there a working beta of the split kernel? ETA?
Work on that has not started yet and there's no ETA currently.