Page MenuHome

Current Mantaflow build hideously unstable when baking gas sim with adaptive domain
Closed, ResolvedPublic

Description

System Information
Operating system: Windows-10-10.0.18362-SP0 64 Bits
Graphics card: GeForce GTX 1070 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 431.86

Blender Version
Broken: version: 2.82 (sub 6), branch: master, commit date: 2020-01-15 21:07, hash: rB689a873029b9
Worked: (optional)

Short description of error
[Repeated (but inconsistent) crashes when trying to bake either sim, or occasionally if that works, when trying to bake noise: with Adaptive Domain enabled.
This has been my experience since starting to use Mantaflow in 2.82Beta and 2.83Alpha (yesterday basically).]

Exact steps for others to reproduce the error
[Bake the attached with and without Adaptive Domain enabled and see what happens.
Here's a (very dull) recording of what I'm seeing-

I've speeded up the really dull bits (all the crash & re-starts & the inevitable 'change the cache folder, so you don't fill your C: drive' bit), so you've only got 5 minutes worth, but the source timecode shows the real-time from the original.
Like I said - it's very inconsistent, it even goes all the way through once- baking both the sim and the noise with Adaptive enabled. (Correction- I've watched the video again and it does not bake both sim & noise all the way through with Adaptive enabled, but I have managed it once or twice, in fact I managed it for my most recent post about T69870)
But basically that's 6 crashes in 6 and a half minutes. I've honestly never experienced Mantaflow to be this flakey in the 6 or 7 months I've been using it.
It does not crash like this on my machines using the 26-11-2019 mantaflow-branch build (AFAIK the most solid build I've had overall).
And my system runs Davinci Studio, Fusion Studio, Agisoft Metashape and most other things without any massive issues.]
[Based on the default startup or an attached .blend file (as simple as possible)
here's the scene file-

]

Event Timeline

I tried again in today's build: not good news I'm afraid...

So, even more crash-prone, and then as well 4 completed bakes, both sim & noise, so more inconsistent. Bummer!
The above was using this build-

Germano Cavalcante (mano-wii) changed the task status from Needs Triage to Needs Information from User.Jan 17 2020, 2:18 PM

The three times I baked the simulation, it worked.
Are you sure you are using one of the builds here?
https://builder.blender.org/download/

It is good to keep in mind that the fluid system is being stabilized. Each confirmed bug is being handled. (Bugs)
For a report to be confirmed it must be specific and describe only one bug.

Operating system: Windows-10-10.0.18941 64 Bits
Graphics card: Radeon (TM) RX 480 Graphics ATI Technologies Inc. 4.5.13559 Core Profile Context 26.20.12028.2

Mark Spink (marks) added a comment.EditedJan 18 2020, 2:40 AM

Both my report and my comment show the version: Broken: version: 2.82 (sub 6), branch: master, commit date: 2020-01-15 21:07, hash: rB689a873029b9, and

,
And yes they are downloaded from blender.org, I stopped using graphicall as soon as they were available (blender today youtube info).

Straight off the bat I can see that your system is different to mine: you're not using Nvidia hardware. And blender's bug report doesn't show if the system is using Intel or AMD architecture- I'm using AMD (threadripper 2950x), what CPU does your system have?
If you're using Intel then you're testing it on an arguably very different system. So quite possibly it DOES work for you.
Is it too much to ask for this to be tested on an AMD/Nvidia system before it's written off?

In fact, and I really should have thought of this before I admit- I just threw the latest 2.82 build on my old HP Z800 (dual Xeon / GTX 1070) & my laptop Gigabyte Aero15 (i7 8750H / GTX 1060) and it baked flawlessly about 30-40 times in a row on the HP Z800, and at least 15 times on my laptop (before I got bored).
Which is how my AMD system behaved before I tried the latest version of Mantaflow in Blender 2.82 (I had previously been using the 'experimental branches' download, before realising it was not being updated).

So now: I'm definitely pointing the finger at an AMD CPU system being the problem.

Work with me here: I've been a Broadcast Engineer for about 30 years, and a Telecoms Engineer for about 12 years before getting into Broadcast (as well as a 'content creator' for about 20 years), so 40 odd years of fault-finding for my job.
If I put new firmware on a satellite receiver at work (read- the new version of Mantaflow in Blender), and it started having issues it had not had previously- black line tearing, frame grabbing, audio hits, inability to lock to a 16APSK modulated signal, whatever... (read- Blender crashing way more than I've EVER seen it do)
I would not be pointing the finger at the receiver hardware (read- my computer, which has always run Mantaflow without a massive number of 'completely-disappearing' type crashes), I would be pointing at the new firmware (read- new version of Mantaflow in Blender), would you not agree?

All my systems have been basically static for at least 3 months (barring windows updates), the Threadripper was rebuilt / re-installed after the motherboard died in September / October, and I put a clean install of Windows on the Z800 soon after, as it is no longer my workstation, basically a render node for Fusion and Blender. The only thing that's not fully up to date is the Nvidia driver, due to the latest Nvidia drivers breaking Blackmagic RAW decoding in Davinci, which I use for money so I need to work properly.

Obviously I'm only doing all this to try to get Mantaflow working properly, and taking up quite a lot of my time doing so.
I am not reporting this to waste your time, and I would never report something that was an occasional / intermittent issue.

And just as obviously, I am very reluctant to start poking about at my Threadripper workstation, which has had no significant hardware or software changes (and still works fine with the 26th November 'experimental branch' release of Mantaflow / Blender) and is very solid for all my paid software (Davinci Studio, Fusion Studio, Syntheyes, Agisoft Metashape), because a single piece of open-source software stops working reliably on it.

and finally-

So that's the exact same system, no changes what so ever, just using the old 26th November build from the 'experimental branches' page on blender.org.
And obviously the same system has been happily running Dainci Studio to create these videos & Handbrake to encode them to MP4, with no issues.

So as stated before: I'm definitely pointing at this being an AMD CPU architecture specific issue.

Cheers
Mark

Mark Spink (marks) added a comment.EditedJan 18 2020, 9:59 AM

Further to my previous comment, I downloaded the latest build of 2.82Beta and 2.83Alpha (version: 2.82 (sub 6), branch: master, commit date: 2020-01-17 18:59, hash: rB5472ae6fdff6 and ersion: 2.83 (sub 0), branch: master, commit date: 2020-01-17 19:09, hash: rB1f92e9903fb7), and for want of a better description- it's a shit-load more reliable!
I baked the above scene in 2.82Beta and it baked both sim and noise about 8 or 9 times all the way through before a 'blender disappears up it's own arse' crash on the noise bake, then another 5 or 6 times all the way before another complete crash, on the noise bake again. 2.83Alpha managed 5 or so complete bakes before crashing blender on the sim bake and I can't be arsed to keep baking and re-baking any more.

Which begs the next question- is this an intentional fix, or just luck?
If it is an intentional fix then fine, all is OK.
If it's just luck/chance then there's every chance it will happen again. Which is not a great position to be in, is it?
Please can we find out from the developer if this was an intentional fix, for what I'm fairly convinced was an AMD architecture issue, or not...

I've not tried the new builds on anything Intel based yet, frankly I'm bored of sitting down baking & re-baking, and I'll only be using the Intel systems for rendering (unless I'm away from home), so all I really care about is the AMD system working properly for baking most of the time.

Cheers
Mark

Further to my previous comment (again), it does crash on Intel hardware as well: I was baking the scene for my latest (and last ever) bug report T73232: Mantaflow: Adaptive Domain breaks Dissolve, on my old HP Z800, using the SSD stripe set on my Threadripper for the cache files via Gig ethernet, when I realised the Z800 had a spare SSD on it- when I swapped to it for the cache and tried to bake the sim uploaded on T73232 Blender died (completely disappeared type crash as usual when baking) 3 times in a row.
I then started thinking it might be an I/O issue...

Just now, however, I watched Blender crash yet again on my AMD machine, trying to bake the relatively simple smoke sim in the animation I showed in the report for T73232.
It had been baking for about an hour, and managed about 30 or 40 frames (as I had been forced to disable Adaptive Domain to use 'dissolve', and also to increase the domain resolution).

Then I realised that, had this been a paid job, I would have wasted a good couple of days worth of time on this endeavour. As I charge about $1500 AUS a day for graphics, that's a good $2500 conservatively. I could pay for Houdini Indi for about 6-8 years for that.

Then I looked at my profile on here, and saw T57669, which I reported in November 2018, and thought: life's too short!

So don't worry- feel free to close this bug, it's almost certainly not a bug as you believe, you did manage to bake the scene 3 times on one system after all, that means it must be as solid as a rock on any system on the planet!

I won't be bothering to report any bugs with Blender any more. I'm guessing 2.82 will happen soon, and Mantaflow will be in it, with whatever bugs it still has.
Maybe one day dynamic weight paint will have a working preview again, who knows. Maybe even particles will work properly in 2.8x some time in the future...

All I know is I downloaded Houdini's free version today, and shall be putting my future efforts into learning that, rather than trying to get Blender working any better.

One final parting thought: would you get into an aeroplane and consider it safe, if it had been tested 3 times by one person?

Let's not be grumpy! The goal here is to find and fix bugs. But to do that, we need to be able to reproduce the problem. That is not always easy. One thing I have learned doing QA is that "identical systems" are not always identical.

Do you want us to keep this report open?

Mark Spink (marks) added a comment.EditedJan 19 2020, 7:38 PM

Not grumpy, just tired (05:30 odd here)...
Final straw that broke the camels back type deal...

I'm fairly sure it is crashing, on 3 different Windows systems, always when baking either the sim or the noise, and inconsistently. As shown in the videos above.

The same 3 systems which run Davinci Studio, Fusion Studio, Syntheyes, Metashape etc. etc. etc. fairly flawlessly.
The build I grabbed yesterday (see system info for T73232) is more solid than the previous 2 builds but still quite flakey in my experience with it.

And I know that it was orders of magnitude less crash-prone with the Dec 26th 'experimental' build I still use, sure it crashed occasionally, but never 9 times in six minutes.

If you can't reproduce this issue, then kill this report, whatever.

All I know is I've got to meet a bloke on Tuesday who I'm shooting and posting a couple of music promos for, and then go to a cricket ground somewhere in Sydney next weekend and learn to drive the vision & RF kit on one of NEP's OB trucks.
So sort of got better things to do. It's a shame, I was so keen to get a decent smoke sim in Blender. If nothing else though, Blender has stopped me wasting money on Lightwave (been doing 3D off n on since the 90s on the Amiga for my sins).

Germano Cavalcante (mano-wii) changed the task status from Needs Information from User to Needs Developer to Reproduce.Jan 19 2020, 7:43 PM

Mantaflow bugs are being fixed (See some examples: Resolved bugs)
We need to prioritize those that we can reproduce.

When confirming a bug, I kind of send the report to the responsible developer.
But if I confirm a report like this (basically if I confirm all reports), I’ll just be getting in the way and taking the developer’s time.

It is quite possible that this problem is already delimited between one of these other reports: Mantaflow confirmed bugs.
I could confirm this if I could reproduce. But as I can't, this report will remain open until another triager can reproduce the problem.

Just tested in this version: 2.82 (sub 6), branch: master, commit date: 2020-01-20 22:06, hash: rB902209eda527, Odd!
I was trying to bake the file used in T69870 to test it in the new version, and got a crash straight away in the sim bake, re-ran blender and it baked OK.
then, ran Blender after a reboot-
Baked sim and noise successfully once, then crashed on the sim bake after freeing data (in the sim).
Re-ran Blender, baked both successfully once, then baked sim OK and crashed on the Noise bake.
Re-ran Blender again, and It must have baked about 20-25 times flawlessly, then finally crashed on the Noise bake.
This bug is more like an angry, hormonal teenager: very unpredictable mood wise.

Mark Spink (marks) added a comment.EditedJan 22 2020, 1:25 AM

Further to my last; I think the current 'xxxeda527' build is LOADS more solid than the previous 2.82Beta builds. And I'm starting to think the the bake - bake - free-bake, bake - bake - free-bake, bake - bake - free-bake, etc. thing I'm doing with a 'simple' scene is not really the best test for this. As in; I've been using this build for some fairly 'heavy' sims this last day, and it's as least as solid as the Nov26-2019-Experimental build which I keep comparing the newer builds to.
But - it still does crash randomly repeatedly baking and re-baking the 'simple' scene, so it's not 100% solid, but then again what is?! (certainly not the software on the Boeing 737 Max!)

And Mr Cavalcante et al: sorry for being a whinging prick the other day.
I am in Australia and they do call my-lot Whinging Pomms over here (so one just has to uphold the stereotype sometimes)...
Must have been my time-of-the-month.
If you're based in Amsterdam I'd like the chance to buy you a beer or 3 later in the year to say sorry properly, when I'm in Harlem for a few months making sure the world gets to see England play football badly again on TV...

@Mark Spink (marks) It could be that T72894 was causing this rather random behavior and the crashes. The issue from that task is fixed now, so it will be interesting to see if it also makes your bakes more stable.

If you can, wait for the fix to propagate into the daily builds and then try again stress-testing your scenes (repeated bake - free bake is great!).

@Sebastián Barschkis (sebbas) Let me know when it's in the blender.org 2.82Beta download & I'll happily (not-really) bake it to death (till I get bored again anyway).

@Mark Spink (marks) You could now try to reproduce the crash. The current build 517870a4a11f from the buildbot has the commits.

You sir; have absolutely nailed the bastard! Blender did bake-bake-freedata 40+ times in a row without a twitch (poor choice of words maybe, I'm just about to try T69870 with this build).
No video this time, just this-


I'd say it's fixed definitely.

Philipp Oeser (lichtwerk) closed this task as Resolved.Jan 24 2020, 9:49 AM
Philipp Oeser (lichtwerk) claimed this task.

This sounds like good news :)

Will close as resolved then.
Please keep us up-to-date on T69870 as well!
Thx again for putting all this energy into it! (Hope you stay onboard...)