Blender has froze my Manajro Linux, which has corrupted my root Btrfs filesystem, rendering the machine unable to boot #81547

Closed
opened 2020-10-08 17:35:19 +02:00 by Tobiasz Karoń · 19 comments

System Information
Operating system: Manjaro Linux
Graphics card: Radeon RX 5500/5500M / Pro 5500M

Blender Version
Broken: Blender 2.90.0, 2.90.1, 2.91.0 Alpha
Worked: none yet

Short description of error
When doing a very specific thing in Shader editor using Eevee rendered, my system froze. I've rebooted and tried again. It froze again, only this time when I rebooted it I was greeted by an emergency shell. My root Btrfs filesystem is unmountable. I'm currently performing disk imaging, and will attempt to fix the FS or restore the system from backup.

Here's the screen that I saw after rebooting my laptop after the second lock-up:
f3fb1797-440e-4a7c-b33b-f19af8b479ce.jpg

Exact steps for others to reproduce the error

I have a specific .blend file that when loaded on Manjaro Linux will freeze the system. I've tried again booting a fresh Manjaro 20.1.1. However it's a confidential client project and I cannot share it publicly.

I wasn't able to reproduce the problem on Windows 10 running on the same laptop.

I suspect the problem could be in an AMD GPU driver - it's the only thing that I believe has enough system access to mess up a roo t filesystem that was involved. And it's clear the problem was triggered by a certain material in Blender, while rendering with Eevee.

Note that the system won't freeze until I turn on Rendered view. Using Material Preview is fine.
Two first time when it happened it was immediately after I have added a mix node to add another texture to an existing image path in an Eevee material.
I thought that maybe texture sizes could be the problem, as I've been using 3 8K textures, and maybe the laptop's GPU can't handle that, but filesystem corruption should never be the result of that.

**System Information** Operating system: Manjaro Linux Graphics card: Radeon RX 5500/5500M / Pro 5500M **Blender Version** Broken: Blender 2.90.0, 2.90.1, 2.91.0 Alpha Worked: none yet **Short description of error** When doing a very specific thing in Shader editor using Eevee rendered, my system froze. I've rebooted and tried again. It froze again, only this time when I rebooted it I was greeted by an emergency shell. My root Btrfs filesystem is unmountable. I'm currently performing disk imaging, and will attempt to fix the FS or restore the system from backup. Here's the screen that I saw after rebooting my laptop after the second lock-up: ![f3fb1797-440e-4a7c-b33b-f19af8b479ce.jpg](https://archive.blender.org/developer/F8968198/f3fb1797-440e-4a7c-b33b-f19af8b479ce.jpg) **Exact steps for others to reproduce the error** I have a specific .blend file that when loaded on Manjaro Linux will freeze the system. I've tried again booting a fresh Manjaro 20.1.1. However it's a confidential client project and I cannot share it publicly. I wasn't able to reproduce the problem on Windows 10 running on the same laptop. I suspect the problem could be in an AMD GPU driver - it's the only thing that I believe has enough system access to mess up a roo t filesystem that was involved. And it's clear the problem was triggered by a certain material in Blender, while rendering with Eevee. Note that the system won't freeze until I turn on Rendered view. Using Material Preview is fine. Two first time when it happened it was immediately after I have added a mix node to add another texture to an existing image path in an Eevee material. I thought that maybe texture sizes could be the problem, as I've been using 3 8K textures, and maybe the laptop's GPU can't handle that, but filesystem corruption should never be the result of that.
Author

Added subscriber: @TobiaszunfaKaron

Added subscriber: @TobiaszunfaKaron
Member

Added subscriber: @lichtwerk

Added subscriber: @lichtwerk
Member

Changed status from 'Needs Triage' to: 'Needs User Info'

Changed status from 'Needs Triage' to: 'Needs User Info'
Member

Hi there!

Thats bad to hear.

Do you recall which driver version you were using?

However it's a confidential client project and I cannot share it publicly.

Well, maybe you could still try to share just that Material isolated? Replace with same size (but sharable) textures?

P.S.: been following some of your Ardour stuff, thx for this!

Hi there! Thats bad to hear. Do you recall which driver version you were using? > However it's a confidential client project and I cannot share it publicly. Well, maybe you could still try to share just that Material isolated? Replace with same size (but sharable) textures? P.S.: been following some of your Ardour stuff, thx for this!
Author

I'm trying to prepare a minimal reproduction project, it didn't freeze the system yet, but exactly when I enabled Material Preview dmesg said this:

[Fri Oct  9 08:54:57 2020] ACPI Error: No handler for Region [VRTC] (0000000000eede7c) [SystemCMOS] (20200528/evregion-127)
[Fri Oct  9 08:54:57 2020] ACPI Error: Region SystemCMOS (ID=5) has no handler (20200528/exfldio-261)
[Fri Oct  9 08:54:57 2020] ACPI Error: Aborting method \_SB.PCI0.SBRG.EC._Q9A due to previous error (AE_NOT_EXIST) (20200528/psparse-529)
I'm trying to prepare a minimal reproduction project, it didn't freeze the system yet, but exactly when I enabled Material Preview dmesg said this: ``` [Fri Oct 9 08:54:57 2020] ACPI Error: No handler for Region [VRTC] (0000000000eede7c) [SystemCMOS] (20200528/evregion-127) [Fri Oct 9 08:54:57 2020] ACPI Error: Region SystemCMOS (ID=5) has no handler (20200528/exfldio-261) [Fri Oct 9 08:54:57 2020] ACPI Error: Aborting method \_SB.PCI0.SBRG.EC._Q9A due to previous error (AE_NOT_EXIST) (20200528/psparse-529) ```
Member

Changed status from 'Needs User Info' to: 'Needs Developer To Reproduce'

Changed status from 'Needs User Info' to: 'Needs Developer To Reproduce'
Member

Added subscribers: @fclem, @Jeroen-Bakker

Added subscribers: @fclem, @Jeroen-Bakker
Member

In #81547#1030997, @TobiaszunfaKaron wrote:
I'm trying to prepare a minimal reproduction project, it didn't freeze the system yet, but exactly when I enabled Material Preview dmesg said this:

[Fri Oct  9 08:54:57 2020] ACPI Error: No handler for Region [VRTC] (0000000000eede7c) [SystemCMOS] (20200528/evregion-127)
[Fri Oct  9 08:54:57 2020] ACPI Error: Region SystemCMOS (ID=5) has no handler (20200528/exfldio-261)
[Fri Oct  9 08:54:57 2020] ACPI Error: Aborting method \_SB.PCI0.SBRG.EC._Q9A due to previous error (AE_NOT_EXIST) (20200528/psparse-529)

I am probably the wrong person to ask, @fclem / @Jeroen-Bakker : does this ring a bell?

> In #81547#1030997, @TobiaszunfaKaron wrote: > I'm trying to prepare a minimal reproduction project, it didn't freeze the system yet, but exactly when I enabled Material Preview dmesg said this: > > ``` > [Fri Oct 9 08:54:57 2020] ACPI Error: No handler for Region [VRTC] (0000000000eede7c) [SystemCMOS] (20200528/evregion-127) > [Fri Oct 9 08:54:57 2020] ACPI Error: Region SystemCMOS (ID=5) has no handler (20200528/exfldio-261) > [Fri Oct 9 08:54:57 2020] ACPI Error: Aborting method \_SB.PCI0.SBRG.EC._Q9A due to previous error (AE_NOT_EXIST) (20200528/psparse-529) > ``` I am probably the wrong person to ask, @fclem / @Jeroen-Bakker : does this ring a bell?
Author

I've loaded the full project and proceeded to freeze the system using a clean Manjaro Linux KDE 20.1.1 using SSH on my phone to log flesh. I got some juicy stuff here:

Screenshot_20201009-105920.jpg

I've pulled the USB drive out to see if the kernel will note and then powered off the machine.

I've loaded the full project and proceeded to freeze the system using a clean Manjaro Linux KDE 20.1.1 using SSH on my phone to log flesh. I got some juicy stuff here: ![Screenshot_20201009-105920.jpg](https://archive.blender.org/developer/F8970710/Screenshot_20201009-105920.jpg) I've pulled the USB drive out to see if the kernel will note and then powered off the machine.
Author

Here's the text version of this dmesg log:
dmesg.txt

Here's the text version of this dmesg log: [dmesg.txt](https://archive.blender.org/developer/F8970718/dmesg.txt)
Author

I thought I couldnt reproduce this in Blender 2.90.1, but no - it's still there.

Only it took about 30 seconds for the freeze to happen this time.

I thought I couldnt reproduce this in Blender 2.90.1, but no - it's still there. Only it took about 30 seconds for the freeze to happen this time.
Author

Here is complete log from the whole SSH session, containing dmesg output since boot until freeze:

dmesg2.txt

Here is complete log from the whole SSH session, containing dmesg output since boot until freeze: [dmesg2.txt](https://archive.blender.org/developer/F8970739/dmesg2.txt)

Are you using the proprietary driver or Mesa implementation?

In any case, could you test with the other one? This would make us certain it's a gpu driver issue.

Are you using the proprietary driver or Mesa implementation? In any case, could you test with the other one? This would make us certain it's a gpu driver issue.
Author

I had a hard time reproducing this in Blender 2.91.0 Alpha, but it happened.

It wasn't as bad though, as my laptop's built-in display didn't go black, and my mouse was still responsive (even though X>org has frozen).
I was able to switch o a TTY and log into shell there.

I've used Ctrl+Alt+Backspace shortcut to kill the X.org session and I was able to recover without rebooting, which is better.

Now - a small hint I have is that the problem only has occured once I have set SSS samples to 12. I was playing with various rendering settings for like 15 minutes to try and trigger the problem, and it only happened when I've risen SSS samples to 12 - I do have materials using Eevee SSS in the scene and they were rendered when I changed this.

Here's a complete dmesg output form my remote SSH session:
dmesg3.txt

Also, here's my hardware information:

$ inxi -b
System:    Host: manjaro Kernel: 5.8.11-1-MANJARO x86_64 bits: 64 Desktop: KDE Plasma 5.19.5 Distro: Manjaro Linux 
Machine:   Type: Laptop System: Micro-Star product: Alpha 15 A3DD v: REV:1.0 serial: <superuser/root required> 
           Mobo: Micro-Star model: MS-16U6 v: REV:1.0 serial: <superuser/root required> UEFI: American Megatrends 
           v: E16U6AMS.10F date: 03/04/2020 
Battery:   ID-1: BAT1 charge: 49.2 Wh condition: 49.8/53.4 Wh (93%) 
CPU:       Quad Core: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx type: MT MCP speed: 1723 MHz min/max: 1400/2300 MHz 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] driver: amdgpu v: kernel 
           Device-2: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Device-3: Acer HD Webcam type: USB driver: uvcvideo 
           Display: x11 server: X.Org 1.20.9 driver: amdgpu FAILED: ati unloaded: modesetting,radeon resolution: 
           1: 1920x1080~120Hz 2: 1920x1080~60Hz 
           OpenGL: renderer: AMD RAVEN (DRM 3.38.0 5.8.11-1-MANJARO LLVM 10.0.1) v: 4.6 Mesa 20.1.8 
Network:   Device-1: Realtek RTL8822CE 802.11ac PCIe Wireless Network Adapter driver: rtw_8822ce 
           Device-2: Realtek driver: r8169 
Drives:    Local Storage: total: 2.31 TiB used: 1016.76 GiB (42.9%) 
Info:      Processes: 344 Uptime: 1h 14m Memory: 29.37 GiB used: 9.01 GiB (30.7%) Shell: Bash inxi: 3.1.05 
I had a hard time reproducing this in Blender 2.91.0 Alpha, but it happened. It wasn't as bad though, as my laptop's built-in display didn't go black, and my mouse was still responsive (even though X>org has frozen). I was able to switch o a TTY and log into shell there. I've used Ctrl+Alt+Backspace shortcut to kill the X.org session and I was able to recover without rebooting, which is better. Now - a small hint I have is that the problem only has occured once I have set SSS samples to 12. I was playing with various rendering settings for like 15 minutes to try and trigger the problem, and it only happened when I've risen SSS samples to 12 - I do have materials using Eevee SSS in the scene and they were rendered when I changed this. Here's a complete dmesg output form my remote SSH session: [dmesg3.txt](https://archive.blender.org/developer/F8971084/dmesg3.txt) Also, here's my hardware information: ``` $ inxi -b System: Host: manjaro Kernel: 5.8.11-1-MANJARO x86_64 bits: 64 Desktop: KDE Plasma 5.19.5 Distro: Manjaro Linux Machine: Type: Laptop System: Micro-Star product: Alpha 15 A3DD v: REV:1.0 serial: <superuser/root required> Mobo: Micro-Star model: MS-16U6 v: REV:1.0 serial: <superuser/root required> UEFI: American Megatrends v: E16U6AMS.10F date: 03/04/2020 Battery: ID-1: BAT1 charge: 49.2 Wh condition: 49.8/53.4 Wh (93%) CPU: Quad Core: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx type: MT MCP speed: 1723 MHz min/max: 1400/2300 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] driver: amdgpu v: kernel Device-2: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel Device-3: Acer HD Webcam type: USB driver: uvcvideo Display: x11 server: X.Org 1.20.9 driver: amdgpu FAILED: ati unloaded: modesetting,radeon resolution: 1: 1920x1080~120Hz 2: 1920x1080~60Hz OpenGL: renderer: AMD RAVEN (DRM 3.38.0 5.8.11-1-MANJARO LLVM 10.0.1) v: 4.6 Mesa 20.1.8 Network: Device-1: Realtek RTL8822CE 802.11ac PCIe Wireless Network Adapter driver: rtw_8822ce Device-2: Realtek driver: r8169 Drives: Local Storage: total: 2.31 TiB used: 1016.76 GiB (42.9%) Info: Processes: 344 Uptime: 1h 14m Memory: 29.37 GiB used: 9.01 GiB (30.7%) Shell: Bash inxi: 3.1.05 ```
Author

Hmm. I was able to fully freeze my system again in 2.91.0 Alpha :(

I've tried reproducing this again and I was only able to do this when Viewport Denoising was on AND the SSS samples were high this time.

Here's extended dmesg output, it contains the dmesg3 file and more stuff that was printed after the last reproduction:

dmesg4.txt

Hmm. I was able to fully freeze my system again in 2.91.0 Alpha :( I've tried reproducing this again and I was only able to do this when Viewport Denoising was on AND the SSS samples were high this time. Here's extended dmesg output, it contains the dmesg3 file and more stuff that was printed after the last reproduction: [dmesg4.txt](https://archive.blender.org/developer/F8971122/dmesg4.txt)

You can also start using --debug-gpu-force-workarounds to see if that makes any difference.

You can also start using `--debug-gpu-force-workarounds` to see if that makes any difference.
Author

This is what I think must have happened:

  • Blender triggered a bug in AMD driver by issuing some specific instruction to the GPU
  • AMD driver (being in kernel space) has overwritten some random memory address
  • This happened to have caused Btrfs to write garbled data to disk
  • Which has rendered the filesystem unbootable
This is what I think must have happened: - Blender triggered a bug in AMD driver by issuing some specific instruction to the GPU - AMD driver (being in kernel space) has overwritten some random memory address - This happened to have caused Btrfs to write garbled data to disk - Which has rendered the filesystem unbootable
Author

I've sent a message to AMD tech support, hopefully they'll pick this up.

I've sent a message to AMD tech support, hopefully they'll pick this up.
Philipp Oeser removed the
Interest
EEVEE & Viewport
label 2023-02-09 15:14:27 +01:00

Bugs in the AMD driver are outside of our scope, also quite some time has passed. Therefore I am closing this one.

Bugs in the AMD driver are outside of our scope, also quite some time has passed. Therefore I am closing this one.
Blender Bot added
Status
Archived
and removed
Status
Needs Info from Developers
labels 2024-02-28 15:47:33 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#81547
No description provided.