Very high memory usage for Cycles baking compared to ordinary renders
Closed, ResolvedPublic

Description

I've been playing around with Cycles baking and running into a few limitations here and there. I've put together a pretty simple scene with an HDR image light, some reasonably simple bricks, a mortar mesh, and a bake target. I haven't tried the other passes, but both Combined and AO seem to have very high memory usage given what is being baked. Blender shoots up to 2.87 GB of memory usage on this fairly simple scene, while it uses about 133 MB (about the same as normal) when I just render. I did all my tests with the CPU only. If Dalai could run a debugger, try baking the scene for himself, and see if he can reproduce the rather preposterous memory usage, I would really appreciate it. I also tried baking with instanced meshes and it didn't seem to make a difference for memory usage.

To get all the relevant meshes into the scene for baking, just toggle layers 2 and 3 to reveal the source meshes.

Download the .blend here.

Using BF's official 2.71 build on Windows 7 64-bit. Processor is AMD FX-8320, GPU is a GTX 780 with 3 GB of VRAM. I have 8 GB of ordinary RAM.

Spencer Brown (sbrown) updated the task description. (Show Details)
Spencer Brown (sbrown) raised the priority of this task from to Needs Triage.
Spencer Brown (sbrown) added a project: Cycles.
Spencer Brown (sbrown) set Type to Bug.
Bastien Montagne (mont29) triaged this task as Normal priority.Jul 16 2014, 8:10 AM

@Spencer Brown (sbrown) that doesn't cause me a big surprise to be honest. You have 96 highpoly objects. The amount of CPU memory needed for the highpoly[i].pixel_array is:

sizeof(BakePixel) * (width * height) (per image) * highpoly_objects
In your case: 28 * 1024 * 1024 * 93 = 2.73 GB

Try joining your highpoly objects together and let me know the baking memory footprint.

@paul geraskin (mifth) can you try the same in your scene (to join the highpoly objects) to see if it prevent crashing? I think your issue may be different, but it's still a valid test. (not sure if your computer will be able to handle

I don't get it. What does the highpoly[i].pixel_array represent? Why is there an image being stored for each hipoly object? That doesn't seem to make much sense. Rendering from the camera doesn't do that.

This is worrying because 96 hipoly objects aren't at all uncommon. For one project I was doing a while ago I had some 1000-1500 rivet objects (linked duplicates). Xnormal had no trouble with that, but if I wanted to bake that in Cycles at 4k, then no amount of RAM would be sufficient - because of rivets. I can easily imagine object counts going into tens of thousands with things like bricks and dupligroups. It's also very unintuitive - we're used to the amount of triangles being important, not objects.

Isn't it possible to automatically join the objects in memory before baking, or not store an image per object?

My memory usage dropped to 486 MB when I joined all the bricks together when baking a 1024x1024 image (I opted not to also join the plane in, as it doesn't have the same modifier stack.) It's a lot better, the memory usage is nearly on par with XNormal (343MB for similar baking parameters.) Unfortunately, Blender ended up crashing at the end of the bake. Perhaps I will try and build from git and see if things have improved any in that regard, or if I can get a backtrace.

@Spencer Brown (sbrown) that would help. Do you have a crash with GPU, CPU or both?

@Piotr Adamowicz (madminstrel) if you want to understand the logic behind that take a look at object_bake_api.c. Basically the pixel_array data I was referring to is not an image, but more like a lookup map defining which pixels to bake from that object into the final image. They are calculated together to make sure a ray from the lowpoly mesh finds only one of them (see the function RE_bake_pixels_populate_from_objects from bake_api.c).

@Dalai Felinto (dfelinto) ok, i tested my file from here again. I merged all hipoly objects, but i got the same issues.

Memory peaks during baking (Normals):
BI Baker - 2.6Gb of RAM
Cycles Baker - 3.7Gb of RAM + 1.5Gb of physical memory.

Cycles baker really kills memory.

@Dalai Felinto (dfelinto): I think your recent fix should fix that?

Please everyone, test a new build from builder.blender.org.

@Thomas Dinges (dingto): tested - sbrown's file still takes 2.87GB with 4cf531f.

@Thomas Dinges (dingto) it is not related, the commit was for CUDA, while this is a CPU report/issue.

@Thomas Dinges (dingto): I did a couple of tests baking normals both in Blender Internal and Cycles with the above linked sbrown's file, not joining together all the highpoly bricks, and after raising the BakeTarget plane up one unit in the Z axis and set the bake Distance to 0 and Bias to 0.2 for the BI test, I got practically no relevant memory usage and the bake was lightning fast (4-5 seconds) with Blender Internal.
Then, with Cycles (same above setup for the BakeTarget plane) with the default values for Bake the memory usage increased during baking of about 2.6 GB and the bake time was less then one minute.

Joining together all the highpoly bricks in the Cycles case reduced the baking time to 1 or 2 seconds and memory usage increased of 140 MB during baking. The undoubtly advantage of joining the highpoly bricks into one single mesh became obvious when you do a Complete bake in Cycles.

Testing platform: Mobile Intel Core i7 720QM (1.66GHz), Win7 Pro 64bit, 8GB RAM, NVIDIA GTS 360M with 1GB VRAM; Blender 2.71.3 64bit (Hash: a90e49e) (blender-2.71-a90e49e-win64.zip)

Hope it helps.

@Dalai Felinto (dfelinto)

I hope this will be fixed, we can't expect that people will join all their objects. Not only it's bad worklfow, biggest problem is, it wont be even possible with most shaders and pipelines. It would brake most of them. It's just no go for production.

As n-pigeon said. Asking user to joins meshes is nonsense! I don't know what is your knowledge about creating shaders in cycles but I can see you forget one of most important features in cycles which is TEXTURE MAPPING. By joining meshes you simply ask users to give up on this primary function, after merge all Generated and Object coordinates will be destroyed!

You said that those (most empty) textures are needed to create coordinates of objects. Why you need to bloat gigabytes of RAM with empty data instead of writing coordinates of only needed pixels in Y/X locations for each of them? I'm not a programmer but for me this sounds like hundreds times less data than storing empty textures. Can you elaborate on this?

Hard to explain how bad this situation is. I hope you refactor all of this quickly and properly. Can you tell us plans and ETA for this?
BTW- Asking regular user to read source code to learn how things work in your design is a bad joke.

Guys, let's relax and drop this "demanding" attitude. I am sure Dalai has not forgotten this, but it's just one of many ongoing projects.

+1 to @Przemyslaw Golab (n-pigeon) and @Bartosz Moniewski (monio) .
Cycles baker is not usable for complex models.
Even with joining meshes the Cycles baker takes much memory (against BI baker) as in this example file

@Thomas Dinges (dingto) sure. We can wait another 3-5 years while you (devs) fix other unfinished things. :)

I hope this will be fixed too. For example shader using "object info random" can't be baked as joined objects - must be in my case as hundred and twenty thousand separeted objects (duplicate linked; 300 meters of nineteenth-century street)

http://postimg.org/image/9qhecvm73/

@Thomas Dinges (dingto)

Sorry if that sound unpleasant, but I don't see it as demanding attitude, it's more of a frustration.

What is the problem:
Feature is there, but we cant make our job easier - or possible - with it, because it's not working.

What creates the frustration:
Someone points the problem and he is told that this is expected or normal, when it isn't.
Additionally solution is brought up which is no solution at all.
It creates insecurity in users, which leads to frustration.

We don't demand:
We don't demand it be done assap. We don't expect wonders.
Our work as artists is very similar we have to work many hours/days on something, design it, produce and then evaluate it with many iterations and after it's finished someone tells us it's not good enough. We understand what devs feel, we are much alike.

So I can understand artists that feel insecure when such a serious problem is taken so lightly.
We also invest in Blender - our time, energy and money - that produces strong feelings too.

So, with all my respect for developers work, communication is important.
We also don't feel comfortable when we try to tell you that something needs work, for many of us it's awkward, unpleasant and we are afraid to be seen as the demanders.

Kind regards.

Simmer down guys. It will be fixed when it's fixed. Rome wasn't built in a day. It's a pretty serious problem, yes, but I don't think anybody's taking it lightly.

Surely there's a way to generate that data stored in the pixel_array on the fly?

Sometimes it's just impossible to give you a plan or an ETA, software development is not easy and predictable. :)
My point is, we have about 170 open bugs atm, this one is just one of them. You feel strong about it, I understand this. But that feeling alone will not make this bug go away faster. It's as simple as that.

Dalai Felinto (dfelinto) closed this task as Archived.Aug 25 2014, 7:26 PM

Moving this to the TODO list.

Some development notes on how to tackle this:

  1. Expand baking API to pass the object id for each pixel
  2. If object id is not reliable, pass a lookup array with object pointers/names

That will increase the memory footprint a bit when "selected to active" is not being used (given that the data struct we will be passing will be a size_t larger).

Can anyone try P132 and see if it works and if it is not much slower than master for one object baking, and if it allows you to bake 100 objects.

I want to test it but I cannot build blender. Can someone build this?

Ok. I made some test. Generally this works far better for CPU but I've found some other issues.
Latest Blender version from buldbot (MASTER) vs patched build by Tungerz (PATCH).

My test file:

CPU TESTS:
Test1 / 201 Hipoly instances / Iddle: 130MB RAM / Normalmap Bake
MASTER - 6005 MB RAM / 2:14min
PATH - 270 MB RAM / 2:04min

Test2 / 201 Unique Hipolys / Iddle: 140MB RAM / Normalmap Bake
MASTER - 6045 MB RAM / 2:15min
PATH - 333 MB RAM / 2:09min

Test3/ 501 Hipoly instances / Iddle: 145MB RAM / Normalmap Bake
MASTER - App crash
PATH - 363 MB RAM / 10:09min

Test4 / 201 Hipoly instances / Iddle: 130MB RAM / Emit Bake
MASTER - 6020 MB RAM / 2:21min
PATH - 267 MB RAM / 2:14min

GPU TEST:
Test1/ 201 Hipoly instances / VRAM Iddle: 280MB RAM / Normalmap Bake
MASTER - 486 MB VRAM / MAX GPU LOAD 13% / 2:58min
PATH - 505 MB VRAM / MAX GPU LOAD 11% / 2:54min

Strange thing that happen in both builds. While running normal render in cycles my computer use approx 100% of CPU or GPU but after running bake my machine use only 60% on CPU (all cores run) and only 13% on GPU! On GPU in my test program load values constantly jumps from 1% to 13% and then again fall to 1%, more or less one time per second. No such thing in normal rendering in cycles (95-99% load all the time)

Are those changes was applied to GPU baking earlier? I did some new test and both master and patched versions bake is similar time and use same amount of VRAM.

@Bartosz Moniewski (monio) don't mix up RAM with VRAM. Even when using GPU the RAM (CPU memory) will be much higher when using this patch than if not.

I've been setting up some test scenes that have moderate to heavy loads but it's going to take a day or two for the one that has 100 models and 300 textures.

There is one thing about benchmarking things like this is the need for a timer. Realistic testing scenarios can take hours per test even for a medium scale one.

Accurate results require many repeat tests. If the tester forgets to watch the progress bar for a few minutes and the bake completes then the test has to be redone because there is no way to know how long the bake for that model actually took.

Maybe you can help with this by splicing in a log file generator. A text file that is printed into the project's directory which shows the start and finish time of each bake would be a huge help. Even just start and finish from the time 'Bake' is pressed to the time it finishes the scene.

Anyways, so far all I can say is that D772 appears to have a memory leak but I'm going to have to verify that. It also appears to use a lot more RAM than trunk does. Not sure about render times yet. I'll get back to you on this.

Hello.

I did some tests too. And memory issue has become worse. It takes much more than it even was before.

I tested this file from here again: https://developer.blender.org/T41126

And i got 3.6Gb of Ram + 3.2Gb VirtualMemory.
I tried to bake Cycles CPU normals.

The bake became not usable for hipoly baking. Sorry.

Strange. Are you guys tested Tungerz build with D722 path or something else? I did simple scene setup to test it and in my case patched version saved bunch of gigabytes of RAM.

Still not tested RAM usage on GPU baking. I thought those arrays are stored on VRAM and didn't even look at main RAM usage monitor.

I tested daily build which was downloaded 5 hours ago.

There is something unstable but I'm not having an easy time reproducing the problem. Yesterday when I started the test scene, I ran it several times and the memory usage often seemed to climb but never return to the original level. Now it's looking more stable, maybe running it over and over was having a cumulative effect. I've tried to reproduce the same conditions but no luck. It seems pretty much the same as master for the 2 setups I tried-> low and medium loads.

When I have the test scene with 100 models/materials(SubSurf-L2) with 300 textures ready I'll upload it so you can use it to test whatever changes you're working on.

@paul geraskin (mifth) - Status on D722 is "Need Review" so it is not in master jet. Probably you tested build with same old code.
If you are on WIN7 64bit get patched Tungerz's build HERE and test it out. Ask someone for builds for other systems if you use them.
I will do tests on your file form T41126 this saturday. My testfile might be too simple to do proper tests.

@marc dion (marcclintdion) - Same question. Patched build? Bake on CPU or GPU? This is strange that your scene takes more RAM with this patch. All this thing should do is combining multiple pixel arrays, that previously eat all the memory, to one pixel array.

Ok let's be clear here:
(1) master has a different code from 2.71 where render tiles influence the overhead of the baking. Set it to high (e.g., 2k x 2k) if you have enough RAM. The change allows big bakes in GPU (as long as you control the tiles size).

(2) to test the patch you need to compare a build before and after the patch.

that was CPU rendering. I built 2 versions. First I updated from GIT, then I built 1 from master and then I applied the patch and built again and named it D722. As I said, it may not even be the patch that's the problem. Whatever was happening yesterday did not happen today when I actually had a screen recording going so I could log the very noticeable RAM climbing.

Also, for that test I was using 100 Suzanne models with L2 subSurf. Each one had two 2048x2048 textures as input(only the first 10 were unique, I still have not created and assigned the other 90 yet.) and the first 10 were also baking each to a 2048x2048 texture.
I was doing a batch bake with the 10 monkey heads that already had unique textures and baking at 128 samples and 3 lights.

It's when you do tests that are complicated with higher quality settings that instabilities in code start to show themselves.
A simple test that takes only 30 seconds is not going to reveal much because the machine/software is not under heavy load.

The problem is that I'm no longer willing to do tests like this which actually reflect a real stress-test condition because this requires that I shut down all background programs and stare at a clock for 30 mins at a time for each test. Then again for the next test, etc...

Anyways, I can't concern myself with testing anymore. I've done too much of that already. When Cycles Bake was new, I spent hours nearly everyday for months changing settings and writing them all down and staring at the clock the whole time so I could record the timing of the tests properly.

I don't mind pressing 'Bake' and then going out for a while, but there's no way I'm staring at that clock anymore just so I can write down how long another bake took..

This is a job for a log file printout. This kind of testing also needs a consistent test scene that everyone should use so the results can be compared against similar testing.
I'm still going to post a fairly large test scene that's going to be around the size of a CD so other people can use it for testing.

Here's an image that shows the basic setup for a benchmark test I've setup. The image only shows 4 monkeys but the benchmark has 100 of them. Each monkey head has a Sub-D modifier at 2 iterations. There are 300 textures and 200 of them are 2048x2048 input textures. The other 100 are 1024x1024 and these are the textures being baked to. All of the bake-to textures should already be selected for every material.

The point of the test is to create a fairly heavy load using both images and geometry.
Even though all the input textures are copies just with different names, Blender does not know that so it has to load all of them which stresses the system more for a better test.

Here's a link to the .blend which is compressed using 7zip. The archive is about 100MB and when it's uncompressed it's about 1GB. I kept the file size down a bit by using grey-scale images for the Diffuse Color Input.

https://app.box.com/s/03tiau4t2308whic3dt6

If you want to save out the images after baking you can use the Save All Images in Texture Paint mode. For recent builds it can be found under the Slots Tab.

@marc dion (marcclintdion): you don't need to stay in front of a clock, use Python instead:

import bpy, time
start_time = time.time()

# baking
bpy.ops.object.bake(type='COMBINED', use_selected_to_active=True)

end_time = time.time()
print(end_time - start_time)

Though for RAM tests you can bake one of the data pass types like DIFFUSE_COLOR.

Well... Thats weird. Patch you did solve this issue completely. My tests above prove this. Why bury this down in ToDos instead of committing this patch to master?

Since we are in bcon2 maybe this is good time to review and apply this (D772) path?

There is a new version of the patch there with a different approach, can someone try it? @Bartosz Moniewski (monio) @Spencer Brown (sbrown)

I don't know how to build this but can make some test with provided Win7-64bit build. I will ask someone for that.
Finally some love for cycles baking! :)

I can test it with linux build. If you provide it.

Doh... Windows and Mac builds links display: "this goo.gl shortlink has been disabled. It was found to be violating our Terms of Service." Funny because of GSoC. ;)

i tested "test_bake.blend" file from here again https://developer.blender.org/T41126 on my machine with 4Gb of memory.

Here are my results of memory:
Cycles:
2Gb before bake
3.7Gb +0.5Gb(Virtual) during bake

Blender Internal:
2Gb before bake
2.9Gb during bake

It bakes now! Great. Cycles takes less memory now.
But i got different results of baking:

As you can see Cycles baked not all parts of hipoly models on my machine. This is a bug?
Just download my blend file and press Bake in cycles or internal and check it too.

Anyway thanks for the development.

The differences you see have nothing to do with the patch (they produce the same result with Blender 2.73). This is a separate issue, but if you want to help getting this fixed it would help to simplify your file to reproduce the issue with the minimum needed components. (and to report it as a new bug)

@Dalai Felinto (dfelinto) ok, i did the report with optimized file.
https://developer.blender.org/T43628
Could you confirm it?

@Dalai Felinto (dfelinto) what do you think about my test above?

With your patch - cycles now taks less memory. But cycles still takes much more memory than Internal baker. About 2 times more.

Would you try to make less memory for the cycles baker? To achieve the same memory as Internal bake uses.

The patch is working fine (tested with @paul geraskin (mifth) file). It still has to go through review though.

As for further optimizations there are no plans in the future (for me at least)

@Dalai Felinto (dfelinto) yes, the patch works. But memory is till taken a lot.

Memory is very important. For example, if a user will bake a guy like this one https://artstation.com/artwork/demon_lich

This guy can be 10-15 millions of polygons. And if it takes much more memory then memory can die on an any machine.

I just want to point that baking should be optimized as much as it can be for professional production.

Made my day. Thank you Dalai, Sergey and others! :)

Ups... i hope i did not changed status of the issue occasionaly?