Cycles HIP error with image textures on Linux and RDNA1 #97591

Closed
opened 2022-04-24 17:26:55 +02:00 by Willie Sippel · 30 comments

System Information
Operating system: Arch Linux
Graphics card: AMD 5700XT

Blender Version
Broken: Blender 3.2 alpha c486da0238
Worked: never

Short description of error
Trying to render a scene with HIP support enabled crashes Blender during the "Updating images" stage. Scenes without image textures (eg the default cube, but also elaborate scenes with complex procedural shaders) render perfectly fine. Tested with the open source ROCm stack from rocm-arch and the official binaries provided by AMD, on kernels 5.15.35-lts, 5.17.4-zen1 and 5.17.4-xanmod1. Also tested with the first HIP enabled nightly build, same result. As a user on the Blender forum reported success on Xorg, I tested on both Gnome/ Xorg and Gnome/ Wayland. Looking through the thread again, it seems all successful reports come from users with RDNA2 GPUs, while the only other failure was reported by another 5700XT owner. The last few lines from the console log:

:3:hip_texture.cpp          :1453: 15928128518 us: 33981: [tid:0x7f0decdff640] hipTexObjectCreate ( 0x7f0,deb,8ed,2e8, 0x7f0,dec,db9,140, 0x7f0,dec,db9,0d0, char array:<null> )
:4:rocdevice.cpp            :2034: 15928128655 us: 33981: [tid:0x7f0decdff640] Allocate hsa device memory 0x7f0c9ca00000, size 0x169000
:3:rocdevice.cpp            :2073: 15928128661 us: 33981: [tid:0x7f0decdff640] device=0x7f0df5092000, freeMem_ = 0xfed20000
:4:rocdevice.cpp            :1899: 15928128724 us: 33981: [tid:0x7f0decdff640] Allocate hsa host memory 0x7f0e10c1e000, size 0x110
Writing: /tmp/bmw27_gpu.crash.txt
fish: Job 1, 'AMD_LOG_LEVEL=4 ./blender' terminated by signal SIGSEGV (Address boundary error)

And the crash log:

# Blender 3.2.0, Commit date: 2022-04-23 13:09, Hash c486da0238bd

# backtrace
./blender(BLI_system_backtrace+0x20) [0xb605b00]
./blender() [0x117952a]
/usr/lib/libc.so.6(+0x42560) [0x7f0e24064560]
/opt/rocm/hip/lib/libamdhip64.so(+0x1e8b3d) [0x7f0dda9e8b3d]
/opt/rocm/hip/lib/libamdhip64.so(hipTexObjectCreate+0x8e1) [0x7f0dda9f1901]
./blender(_ZN3ccl9HIPDevice9tex_allocERNS_14device_textureE+0x6b5) [0x2d4d235]
./blender(_ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_5SceneEiPNS_8ProgressE+0x392) [0x3464122]
./blender() [0x84a86e5]
./blender() [0x16759b5]
./blender() [0x1675c6b]
./blender() [0x1662987]
./blender() [0x166f6a0]
./blender() [0x16716dc]
./blender() [0x16718d9]
/usr/lib/libc.so.6(+0x8d5c2) [0x7f0e240af5c2]
/usr/lib/libc.so.6(clone+0x44) [0x7f0e24134584]

# Python backtrace

Exact steps for others to reproduce the error
Render anything with an image texture with HIP and GPU compute enabled. I used the BMW benchmark to create the logs.

**System Information** Operating system: Arch Linux Graphics card: AMD 5700XT **Blender Version** Broken: Blender 3.2 alpha c486da0238bd Worked: never **Short description of error** Trying to render a scene with HIP support enabled crashes Blender during the "Updating images" stage. Scenes without image textures (eg the default cube, but also elaborate scenes with complex procedural shaders) render perfectly fine. Tested with the open source ROCm stack from rocm-arch and the official binaries provided by AMD, on kernels 5.15.35-lts, 5.17.4-zen1 and 5.17.4-xanmod1. Also tested with the first HIP enabled nightly build, same result. As a user on the Blender forum reported success on Xorg, I tested on both Gnome/ Xorg and Gnome/ Wayland. Looking through the thread again, it seems all successful reports come from users with RDNA2 GPUs, while the only other failure was reported by another 5700XT owner. The last few lines from the console log: ``` :3:hip_texture.cpp :1453: 15928128518 us: 33981: [tid:0x7f0decdff640] hipTexObjectCreate ( 0x7f0,deb,8ed,2e8, 0x7f0,dec,db9,140, 0x7f0,dec,db9,0d0, char array:<null> ) :4:rocdevice.cpp :2034: 15928128655 us: 33981: [tid:0x7f0decdff640] Allocate hsa device memory 0x7f0c9ca00000, size 0x169000 :3:rocdevice.cpp :2073: 15928128661 us: 33981: [tid:0x7f0decdff640] device=0x7f0df5092000, freeMem_ = 0xfed20000 :4:rocdevice.cpp :1899: 15928128724 us: 33981: [tid:0x7f0decdff640] Allocate hsa host memory 0x7f0e10c1e000, size 0x110 Writing: /tmp/bmw27_gpu.crash.txt fish: Job 1, 'AMD_LOG_LEVEL=4 ./blender' terminated by signal SIGSEGV (Address boundary error) ``` And the crash log: ``` # Blender 3.2.0, Commit date: 2022-04-23 13:09, Hash c486da0238bd # backtrace ./blender(BLI_system_backtrace+0x20) [0xb605b00] ./blender() [0x117952a] /usr/lib/libc.so.6(+0x42560) [0x7f0e24064560] /opt/rocm/hip/lib/libamdhip64.so(+0x1e8b3d) [0x7f0dda9e8b3d] /opt/rocm/hip/lib/libamdhip64.so(hipTexObjectCreate+0x8e1) [0x7f0dda9f1901] ./blender(_ZN3ccl9HIPDevice9tex_allocERNS_14device_textureE+0x6b5) [0x2d4d235] ./blender(_ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_5SceneEiPNS_8ProgressE+0x392) [0x3464122] ./blender() [0x84a86e5] ./blender() [0x16759b5] ./blender() [0x1675c6b] ./blender() [0x1662987] ./blender() [0x166f6a0] ./blender() [0x16716dc] ./blender() [0x16718d9] /usr/lib/libc.so.6(+0x8d5c2) [0x7f0e240af5c2] /usr/lib/libc.so.6(clone+0x44) [0x7f0e24134584] # Python backtrace ``` **Exact steps for others to reproduce the error** Render anything with an image texture with HIP and GPU compute enabled. I used the BMW benchmark to create the logs.
Author

Added subscriber: @wsippel

Added subscriber: @wsippel

#100711 was marked as duplicate of this issue

#100711 was marked as duplicate of this issue

#98900 was marked as duplicate of this issue

#98900 was marked as duplicate of this issue

#98859 was marked as duplicate of this issue

#98859 was marked as duplicate of this issue

Added subscribers: @JacquesLucke, @Jeroen-Bakker, @iss

Added subscribers: @JacquesLucke, @Jeroen-Bakker, @iss

@Jeroen-Bakker, @JacquesLucke Can you reproduce? According to HW list you have access to this GPU.

@Jeroen-Bakker, @JacquesLucke Can you reproduce? According to HW list you have access to this GPU.

Added subscriber: @Luciddream

Added subscriber: @Luciddream

Added subscriber: @tschipie

Added subscriber: @tschipie
Member

Closed as duplicate of #97997

Closed as duplicate of #97997

Changed status from 'Duplicate' to: 'Confirmed'

Changed status from 'Duplicate' to: 'Confirmed'

Added subscribers: @BrianSavery, @Sayak-Biswas
Removed subscriber: @JacquesLucke

Added subscribers: @BrianSavery, @Sayak-Biswas Removed subscriber: @JacquesLucke
Brecht Van Lommel changed title from Address boundary error rendering with HIP on Linux, maybe RDNA1 specific to Cycles HIP error with image textures on Linux and RDNA1 2022-05-23 16:46:53 +02:00

Added subscriber: @brecht

Added subscriber: @brecht

This report was merged into a bug report about Windows, where a driver update fixed the issue. However the Linux driver is quite different, and I have not seen confirmation yet that it's fixed on Linux.

Marking this as a high priority issue since we really should try to fix this for 3.2.

This report was merged into a bug report about Windows, where a driver update fixed the issue. However the Linux driver is quite different, and I have not seen confirmation yet that it's fixed on Linux. Marking this as a high priority issue since we really should try to fix this for 3.2.

It happens on blender-3.2.0-beta+v32.84e55e3dc251-linux.x86_64-release on Ubuntu 20.04, rocm 5.1.3 and an 5600XT (RDNA1)

:3:hip_texture.cpp          :1453: 1395192224 us: 6547 : [tid:0x7f3489bfe700] hipTexObjectCreate ( 0x7f3,488,fbc,2a8, 0x7f3,489,bb8,1e0, 0x7f3,489,bb8,170, char array:<null> )
:4:rocdevice.cpp            :2035: 1395192555 us: 6547 : [tid:0x7f3489bfe700] Allocate hsa device memory 0x7f330a000000, size 0x1e4000
:3:rocdevice.cpp            :2074: 1395192567 us: 6547 : [tid:0x7f3489bfe700] device=0x7f348d81d000, freeMem_ = 0x7a15c4a8
:4:rocdevice.cpp            :1900: 1395192689 us: 6547 : [tid:0x7f3489bfe700] Allocate hsa host memory 0x7f3490628000, size 0x110
Writing: /tmp/classroom.crash.txt
Segmentation fault (core dumped)
# Blender 3.2.0, Commit date: 2022-05-22 19:10, Hash 84e55e3dc251
Read library:  '/home/andreas/Downloads/classroom/assets/lamps/lamps.blend', '//assets/lamps/lamps.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/chairs/chairs.blend', '//assets/chairs/chairs.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/coatStand/coatStand.blend', '//assets/coatStand/coatStand.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/desks/desks.blend', '//assets/desks/desks.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/dustBin/dustBin.blend', '//assets/dustBin/dustBin.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/radiator/radiator.blend', '//assets/radiator/radiator.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/suitcase/suitcase.blend', '//assets/suitcase/suitcase.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/wallClock/wallClock.blend', '//assets/wallClock/wallClock.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/wastes/wastes.blend', '//assets/wastes/wastes.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/books/books.blend', '//assets/books/books.blend', parent '<direct>'  # Info
Read library:  '/home/andreas/Downloads/classroom/assets/officeSupplies/officeSupplies.blend', '//assets/officeSupplies/officeSupplies.blend', parent '<direct>'  # Info
bpy.context.scene.cycles.device = 'GPU'  # Property

# backtrace
./blender(BLI_system_backtrace+0x20) [0xc3727f0]
./blender() [0x11d83da]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f34cb1e6420]
/opt/rocm/hip/lib/libamdhip64.so(+0x1e8bfd) [0x7f3470721bfd]
/opt/rocm/hip/lib/libamdhip64.so(hipTexObjectCreate+0x8e1) [0x7f347072a9c1]
./blender(_ZN3ccl9HIPDevice9tex_allocERNS_14device_textureE+0x6b5) [0x2eb26e5]
./blender(_ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_5SceneEiPNS_8ProgressE+0x392) [0x35d21b2]
./blender() [0x88571b5]
./blender() [0x1793f15]
./blender() [0x17941cb]
./blender() [0x1781207]
./blender() [0x178dc00]
./blender() [0x178fc3c]
./blender() [0x178fe39]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f34cb1da609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f34cabb8133]

# Python backtrace

If you need any specific tests, logs or information please let me know.

[edit}
It doesn't crash with all image textures, the splash screen from v2.81 (The Junk Shop) renders without a problem, although having quite a lot of image textures.

It happens on blender-3.2.0-beta+v32.84e55e3dc251-linux.x86_64-release on Ubuntu 20.04, rocm 5.1.3 and an 5600XT (RDNA1) ``` :3:hip_texture.cpp :1453: 1395192224 us: 6547 : [tid:0x7f3489bfe700] hipTexObjectCreate ( 0x7f3,488,fbc,2a8, 0x7f3,489,bb8,1e0, 0x7f3,489,bb8,170, char array:<null> ) :4:rocdevice.cpp :2035: 1395192555 us: 6547 : [tid:0x7f3489bfe700] Allocate hsa device memory 0x7f330a000000, size 0x1e4000 :3:rocdevice.cpp :2074: 1395192567 us: 6547 : [tid:0x7f3489bfe700] device=0x7f348d81d000, freeMem_ = 0x7a15c4a8 :4:rocdevice.cpp :1900: 1395192689 us: 6547 : [tid:0x7f3489bfe700] Allocate hsa host memory 0x7f3490628000, size 0x110 Writing: /tmp/classroom.crash.txt Segmentation fault (core dumped) ``` ``` # Blender 3.2.0, Commit date: 2022-05-22 19:10, Hash 84e55e3dc251 Read library: '/home/andreas/Downloads/classroom/assets/lamps/lamps.blend', '//assets/lamps/lamps.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/chairs/chairs.blend', '//assets/chairs/chairs.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/coatStand/coatStand.blend', '//assets/coatStand/coatStand.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/desks/desks.blend', '//assets/desks/desks.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/dustBin/dustBin.blend', '//assets/dustBin/dustBin.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/radiator/radiator.blend', '//assets/radiator/radiator.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/suitcase/suitcase.blend', '//assets/suitcase/suitcase.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/wallClock/wallClock.blend', '//assets/wallClock/wallClock.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/wastes/wastes.blend', '//assets/wastes/wastes.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/books/books.blend', '//assets/books/books.blend', parent '<direct>' # Info Read library: '/home/andreas/Downloads/classroom/assets/officeSupplies/officeSupplies.blend', '//assets/officeSupplies/officeSupplies.blend', parent '<direct>' # Info bpy.context.scene.cycles.device = 'GPU' # Property # backtrace ./blender(BLI_system_backtrace+0x20) [0xc3727f0] ./blender() [0x11d83da] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f34cb1e6420] /opt/rocm/hip/lib/libamdhip64.so(+0x1e8bfd) [0x7f3470721bfd] /opt/rocm/hip/lib/libamdhip64.so(hipTexObjectCreate+0x8e1) [0x7f347072a9c1] ./blender(_ZN3ccl9HIPDevice9tex_allocERNS_14device_textureE+0x6b5) [0x2eb26e5] ./blender(_ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_5SceneEiPNS_8ProgressE+0x392) [0x35d21b2] ./blender() [0x88571b5] ./blender() [0x1793f15] ./blender() [0x17941cb] ./blender() [0x1781207] ./blender() [0x178dc00] ./blender() [0x178fc3c] ./blender() [0x178fe39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f34cb1da609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f34cabb8133] # Python backtrace ``` If you need any specific tests, logs or information please let me know. [edit} It doesn't crash with all image textures, the splash screen from v2.81 (The Junk Shop) renders without a problem, although having quite a lot of image textures.
Author

Interesting. For me, just adding an image texture node to an active material (with no texture actually loaded), and connecting the color output of the texture node to the Principled BSDF color input, crashes Blender immediately if I use Cycles with GPU compute enabled as my viewport renderer.

Adding something else, a Voronoi for example, as color source works just fine, HIP-accelerated raytracing and all.

Interesting. For me, just adding an image texture node to an active material (with no texture actually loaded), and connecting the color output of the texture node to the Principled BSDF color input, crashes Blender immediately if I use Cycles with GPU compute enabled as my viewport renderer. Adding something else, a Voronoi for example, as color source works just fine, HIP-accelerated raytracing and all.
Author

Can confirm, the Junk Shop scene works. I even triple checked with logging and radeontop to make sure it's really using HIP - it is. No issues with the hardware accelerated Cycles viewport either.

Can confirm, the Junk Shop scene works. I even triple checked with logging and radeontop to make sure it's really using HIP - it is. No issues with the hardware accelerated Cycles viewport either.
Member

I was able to reproduce this issue with a 5700XT + Ubuntu 20.04 + ROCm 5.1.2 with bmw and classroom scenes. I'm looking into this.

I was able to reproduce this issue with a 5700XT + Ubuntu 20.04 + ROCm 5.1.2 with bmw and classroom scenes. I'm looking into this.
Author

As far as I can tell, the crash happens if a scene uses textures with a horizontal resolution that isn't a multiple of 128. Any random value and any value below 128 I've tested crashes, but 128, 256, 384, 512, 768, 1024, 1536, 1664, 2048 and 4096 all worked perfectly fine. The vertical resolution doesn't matter.

As far as I can tell, the crash happens if a scene uses textures with a horizontal resolution that isn't a multiple of 128. Any random value and any value below 128 I've tested crashes, but 128, 256, 384, 512, 768, 1024, 1536, 1664, 2048 and 4096 all worked perfectly fine. The vertical resolution doesn't matter.

I looked into adding a workaround for 3.2 that would rescale textures automatically, but it's turning out to be rather complicated and a risky change this close to the release. I think it's more likely we'll wait for a driver fix and release with a warning in the release notes, unless @Sayak-Biswas or @BrianSavery think this is going to take a long time to fix in the driver.

I looked into adding a workaround for 3.2 that would rescale textures automatically, but it's turning out to be rather complicated and a risky change this close to the release. I think it's more likely we'll wait for a driver fix and release with a warning in the release notes, unless @Sayak-Biswas or @BrianSavery think this is going to take a long time to fix in the driver.

Added subscriber: @niobium93

Added subscriber: @niobium93

Added subscriber: @Inko

Added subscriber: @Inko
Member

Added subscribers: @TeryakiiSauce, @PratikPB2123, @Alaska

Added subscribers: @TeryakiiSauce, @PratikPB2123, @Alaska

Added subscriber: @Takuro-Shoji

Added subscriber: @Takuro-Shoji

Added subscriber: @Caden-Mitchell

Added subscriber: @Caden-Mitchell
Author

This issue doesn't seem to be RDNA1 specific after all, a Vega user on the forum reported the same problem: Junk Shop (which uses power-of-two textures) renders just fine, other scenes crash in hipTexObjectCreate.

This issue doesn't seem to be RDNA1 specific after all, a Vega user on the forum reported the same problem: Junk Shop (which uses power-of-two textures) renders just fine, other scenes crash in hipTexObjectCreate.
Member

Update on this issue: there is a fix for this in the driver and it should be available on rocm 5.3.0.

Update on this issue: there is a fix for this in the driver and it should be available on rocm 5.3.0.
Member

Added subscribers: @io7m, @ThomasDinges

Added subscribers: @io7m, @ThomasDinges
Author

I can confirm that the issue is indeed resolved with ROCm 5.3. I'd close the task, but I guess we should wait for confirmation from a Vega user?

I can confirm that the issue is indeed resolved with ROCm 5.3. I'd close the task, but I guess we should wait for confirmation from a Vega user?

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Brecht Van Lommel self-assigned this 2022-10-04 18:20:39 +02:00

ROCm 5.3 is out now, and it was confirmed this is fixed.

ROCm 5.3 is out now, and it was confirmed this is fixed.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
13 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#97591
No description provided.