Skip to content

[Issue]: Atomic optimizer reorder causes memory access fault in Blender #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GZGavinZhao opened this issue Apr 3, 2024 · 18 comments
Closed
Labels
generic Build error, or some other issue not caused by an LLVM bug Under Investigation

Comments

@GZGavinZhao
Copy link

GZGavinZhao commented Apr 3, 2024

Problem Description

Following https://projects.blender.org/blender/blender/issues/112084, I've bisected the rocm-6.0.x branch and found that commit 30a3adf caused any Blender render (using HIP, of course) to crash with message along the lines of "Memory access fault by GPU node-1 (Agent handle: 0x7f1db8337e00) on address 0x7f1bf177e000. Reason: Page not present or supervisor privilege."

Operating System

Solus 4.5 Resilience

CPU

AMD Ryzen 7 5800H with Radeon Graphics

GPU

AMD Instinct MI250, AMD Radeon VII

ROCm Version

ROCm 6.0.0

ROCm Component

llvm-project

Steps to Reproduce

  1. Build this project at the commit mentioned.

  2. Download the Blender 4.1 release binaries: curl -O https://download.blender.org/release/Blender4.1/blender-4.1.0-linux-x64.tar.xz, tar xf blender-4.1.0-linux-x64.tar.xz. You should now have a folder blender-4.1.0-linux-x64.

  3. Clone Blender. Just cloning the v4.1.0 tag is enough: git clone https://projects.blender.org/blender/blender.git --depth 1 --branch v4.1.0.

  4. In the Blender repo, compile the HIP fatbin used to run Blender render: hipcc --offload-arch=$arch --genco intern/cycles/kernel/device/hip/kernel.cpp -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -D HIPCC -I intern/cycles/kernel/.. -I intern/cycles/kernel/device/hip -ffast-math -o kernel_$arch.fatbin. Adjust HIP_ROCCLR_HOME, HIP_CLANG_PATH as necessary to point to the Clang you just compiled. Replace $arch with the GPU architecture to run on, e.g. gfx900 or gfx1030. Don't add extra attributes like :xnack-.

    If you want to run on multiple architectures, repeat step 4 and 5 for each architecture.

  5. Put this file into blender-4.1.0-linux-x64/4.1/scripts/addons/cycles/lib/kernel_$arch.fatbin.

  6. Get the BMW27 Blender demo file. curl -O https://download.blender.org/demo/test/BMW27.blend.zip, unzip BMW27.blend.zip. You should have a file BMW27.blend.

  7. Now run Blender render. blender-4.1.0-linux-x64/blender -b <path-to-BMW27.blend> -f 0 -- --cycles-device HIP. By default it runs on GPU with device ID 0, so adjust HIP_VISIBLE_DEVICES as necessary to run on the desired GPU.

    You should almost immediately see Blender crash with an error message similar to "Memory access fault by GPU node-1 (Agent handle: 0x7f1db8337e00) on address 0x7f1bf177e000. Reason: Page not present or supervisor privilege."

  8. Now, build LLVM at 1 commit prior, e.g. git switch --detach 30a3adf50e2d49dfc97c1b614d9b93638eba672d~1. Repeat step 4-7, and Blender should render normally.

All of this is on ROCm 6.0.0. If you get a hang instead of a crash when running Blender (likely your on an APU), Ctrl+C and run again with environment variable HSA_ENABLE_SMDA=0.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

rocminfo --support
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5800H with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5800H with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2200                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    61576816(0x3ab9670) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    61576816(0x3ab9670) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    61576816(0x3ab9670) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6600M                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29695(0x73ff)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2720                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            28                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 116                                
  SDMA engine uCode::      76                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx90c                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5688(0x1638)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2000                               
  BDFID:                   2048                               
  Internal Node ID:        2                                  
  Compute Unit:            8                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 471                                
  SDMA engine uCode::      40                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***          

Additional Information

This behavior has been reproduced on MI250, RX6600M, Vega 10, and Ryzen 7 5800H. (gfx90a, gfx1032, gfx900, gfx90c, respectively)

Kernel version: 6.6.22-281.current, with torvalds/linux@96c211f reverted (ref: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100298.html and ROCm/ROCm#2596 (comment))

@LAKostis
Copy link

LAKostis commented Apr 4, 2024

I can confirm that reverting that commit (30a3adf) fixes the crash with other scenes like classroom but not with blender-3.2 (https://cloud.blender.org/p/gallery/629f23f908e12d4ff15241d3) which still crashed with the similar error (happens only with rocm-6.x):

Compiling HIP kernel ...
hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O3 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Kernel compilation finished in 50.97s.
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
Memory access fault by GPU node-1 (Agent handle: 0x7fd727132600) on address 0x7fd7932ca000. Reason: Page not present or supervisor privilege.
Aborted

@GZGavinZhao
Copy link
Author

GZGavinZhao commented Apr 4, 2024

@LAKostis this is weird, because the blender-3.2 splashscreen renders fine for me on all the devices I've tested. Are you sure your Blender is not using your cycles cache? Maybe try clearing ~/.cache/cycles before every run and install the fatbin files into /usr/share/blender/4.1/scripts/addons/cycles/lib for consistency?

I also see that you seem to be using the Blender provided by your distro. For consistency, can you download the official binaries from Blender and try testing with that instead?

@LAKostis
Copy link

LAKostis commented Apr 4, 2024

@LAKostis this is weird, because the blender-3.2 splashscreen renders fine for me on all the devices I've tested. Are you sure your Blender is not using your cycles cache? Maybe try clearing ~/.cache/cycles before every run and install the fatbin files into /usr/share/blender/4.1/scripts/addons/cycles/lib for consistency?

I also see that you seem to be using the Blender provided by your distro. For consistency, can you download the official binaries from Blender and try testing with that instead?

Yes, I've specially tested this before rocm-6.0.x migration and this demo started failing only with rocm-6.0.x. Cache clearing doesn't help. It can be device specific issue - this demo crashes on my RX 6700 XT (gfx1031) but works on gfx900 (with rendering artifacts and only after setting HSA_ENABLE_SMDA=0)

Regarding the blender build - I'm the blender package maintainer in this distro, so I'm sure what build options where used there :) For the sake of clarity official blender builds crash exactly the same way. I can provide any additional information or logs if you need any.

@GZGavinZhao
Copy link
Author

Thanks for the quick response!

It can be device specific issue - this demo crashes on my RX 6700 XT (gfx1031) but works on gfx900 (with rendering artifacts and only after setting HSA_ENABLE_SMDA=0)

Just curious, does compiling against gfx1030 and running Blender through HSA_OVERRIDE_GFX_VERSION on your RX 6700XT change anything? Also what kernel version are you on?

@LAKostis
Copy link

LAKostis commented Apr 7, 2024

HSA_OVERRIDE_GFX_VERSION

No, with HSA_OVERRIDE_GFX_VERSION=gfx1030 it doesn't start with error HIP hipInit: Invalid device

But interesting, it works with lowering -O level:

Read prefs: "/home/lakostis/.config/blender/4.1/config/userpref.blend"
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
Compiling HIP kernel ...
hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O3 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Kernel compilation finished in 50.93s.
Memory access fault by GPU node-1 (Agent handle: 0x7fa997041200) on address 0x7fa9f1b5f000. Reason: Page not present or supervisor privilege.
Aborted
...
❯ blender
Read prefs: "/home/lakostis/.config/blender/4.1/config/userpref.blend"
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
❯ hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O1 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
...

with -O1 rendering works. And it start crashing with >=-O2. So something is not right with optimization.

Also what kernel version are you on?

I'm using 6.6.25 kernel + patches up to v6.5-2638-gbf901afac5d5f from amd-staging-drm-next

@LAKostis
Copy link

LAKostis commented Apr 7, 2024

UPDATE: more funny things with compiler:

hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O1 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"

This command produces workable kernel despite of warnings about deprecated commands. But this command:

hipcc -Wno-parentheses-equality -Wno-unused-value -O1 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"

Produces kernel which crashes. And those kernels are not equal:

-rw-r--r-- 1 lakostis lakostis 4286552 Apr  7 13:10 cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458.crash_O1
-rw-r--r-- 1 lakostis lakostis 3213824 Apr  7 13:09 cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458.works_O1

vt-alt pushed a commit to altlinux/specs that referenced this issue Apr 7, 2024
- Applied fixes for cycles:
  + cycles/hip: reduce opt level and enable hipcc-func-supp for
    gfx1031 kernel (see ROCm/llvm-project#58)
@LAKostis
Copy link

UPDATE: checked with recent rocm and blender 4.2.0

kernel 6.9.10

❯ rpm -qa|fgrep 6.1.2-alt0
fgrep: warning: fgrep is obsolescent; using grep -F
rocm-comgr-devel-6.1.2-alt0.2.x86_64
llvm-rocm-6.1.2-alt0.2.x86_64
clang-rocm-6.1.2-alt0.2.x86_64
hip-devel-6.1.2-alt0.2.x86_64
librocm-smi1-6.1.2-alt0.2.x86_64
rocminfo-6.1.2-alt0.1.x86_64
clang-rocm-tools-6.1.2-alt0.2.x86_64
rocm-smi-6.1.2-alt0.2.x86_64
rocm-opencl-runtime-6.1.2-alt0.2.x86_64
llvm-rocm-filesystem-6.1.2-alt0.2.x86_64
libhsakmt1-6.1.2-alt0.1.x86_64
clang-rocm-libs-support-6.1.2-alt0.2.x86_64
libhsa-runtime1-6.1.2-alt0.1.x86_64
hip-runtime-amd-6.1.2-alt0.2.x86_64
clang-rocm-libs-6.1.2-alt0.2.x86_64
lld-rocm-6.1.2-alt0.2.x86_64
rocm-device-libs-6.1.2-alt0.2.x86_64
libamd_comgr2-6.1.2-alt0.2.x86_64
hipcc-6.1.2-alt0.2.x86_64

❯ rpm -q blender
blender-4.2.0-alt0.1.x86_64

If I compile gfx1031 with previous workaround (--hipcc-func-supp -O1) blender crashes on every rendering with errors:

❯ blender                                                                                                                                    
register_class(...):                                                                                                                         
Info: Registering key-config preferences class: 'Prefs', bl_idname 'Blender' has been registered before, unregistering previous              
register_class(...):                                                                                                                         
Info: Registering key-config preferences class: 'Prefs', bl_idname 'Blender' has been registered before, unregistering previous                                                                                                                                                           
Read blend: "/home/lakostis/Downloads/bmw27_gpu.blend"                                                                                                                                                                                                                                    
Warning: region type 4 missing in space type "Info" (id: 7) - removing region                                                                                                                                                                                                             
:0:rocdevice.cpp            :2895: 420951542923 us: [pid:4113852 tid:0x7f5723200000] Callback: Queue 0x7f5600700000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29                   
Aborted

If I compile kernel with default upstream options (-O3) it renders most scenes (bmw/classroom) but still crashes on Blender 3 scene:

Read blend: "/home/lakostis/Downloads/bmw27_gpu.blend"                                                                                       
Warning: region type 4 missing in space type "Info" (id: 7) - removing region                                                                
Compiling HIP kernel ...                                                                                                                     
hipcc -Wno-parentheses-equality -Wno-unused-value -O3 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.2/scripts/addons_core/cycles/source --genco /usr/share/blender/4.2/scripts/addons_core/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernel
s/cycles_kernel_gfx1031_210C856BB7ABA617B857E9D03ED272C1"                                                                                    
Kernel compilation finished in 110.52s.
...
Read blend: "/home/lakostis/Downloads/Blender 3.blend"                                                                                                                                                                                                                                    
Memory access fault by GPU node-1 (Agent handle: 0x7f115fbc1800) on address 0x7f0ffe842000. Reason: Page not present or supervisor privilege.                                                                                                                                             
Aborted

If I compile with -O2, still the same behavior, Blender 3 crashes.

But everything works with -O1:

❯ hipcc -Wno-parentheses-equality -Wno-unused-value -O1 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.2/scripts/addons_core/cycles/source --genco /usr/share/blender/4.2/scripts/addons_core/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_210C856BB7ABA617B857E9D03ED272C1"                                                                                                                                                                                                                               
                                                                                                                                                                                                                  
❯ blender                                                                                                                                                                                                                                                                                 
Read blend: "/home/lakostis/Downloads/Blender 3.blend"                                                                                       
Saved session recovery to "/tmp/.private/lakostis/quit.blend"                                                                                
Writing userprefs: "/home/lakostis/.config/blender/4.2/config/userpref.blend" ok                                                             
Info: Preferences saved                                                                                                                                                                                                                                                                   
                                                                                                                                             
Blender quit

@GZGavinZhao
Copy link
Author

Does reverting 30a3adf still help? I'm considering just reverting this patch for Solus's ROCm 6.1.2.

@LAKostis
Copy link

LAKostis commented Jul 22, 2024 via email

@pravinjagtap
Copy link

Can try after applying following patches:
llvm@c86a1e6
llvm@9ff7181
llvm@56af0e9

@LAKostis
Copy link

I've tested with reverted commit, can try again without reverting.

Without reverting that commit everything works with -O1 but fails if O >=2.

@LAKostis
Copy link

Can try after applying following patches: llvm@c86a1e6 llvm@9ff7181 llvm@56af0e9

Hey! Those patches are already applied in rocm-llvm somewhere in this bulk commit (1ce2523)

❯ patch -p1 --dry-run < ../c86a1e6903e9935b808c1406f480c769279b69fa.patch
checking file llvm/lib/Transforms/Scalar/GVN.cpp
Hunk #1 succeeded at 487 (offset 14 lines).
Hunk #2 FAILED at 2789.
1 out of 2 hunks FAILED
checking file llvm/lib/Transforms/Scalar/NewGVN.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored
checking file llvm/test/Transforms/GVN/convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored
checking file llvm/test/Transforms/NewGVN/convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored

❯ patch -p1 --dry-run < ../9ff71814cb5d71e907feaa0b3165e866b882f9aa.patch
checking file llvm/lib/Transforms/Scalar/EarlyCSE.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
3 out of 3 hunks ignored
checking file llvm/test/Transforms/EarlyCSE/AMDGPU/convergent-call.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored

❯ patch -p1 --dry-run < ../56af0e913ce7ec29690cc7295d75fc5573153bbf.patch
checking file llvm/lib/Transforms/Scalar/EarlyCSE.cpp
Hunk #1 succeeded at 336 with fuzz 2 (offset 18 lines).
Hunk #2 FAILED at 352.
1 out of 2 hunks FAILED
checking file llvm/test/CodeGen/AMDGPU/cse-convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored

@pravinjagtap
Copy link

ROCm 6.0.0 might have this issue. Looks like those patches (fixes) landed in next rocm-release immediately.
AFAIU, this should be fixed in the latest rocm release. Is this exists on rocm-6.2?

searlmc1 pushed a commit that referenced this issue Aug 24, 2024
)

Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl #2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
@ppanchad-amd ppanchad-amd added Under Investigation generic Build error, or some other issue not caused by an LLVM bug labels Jan 13, 2025
@sohaibnd
Copy link

Hi @GZGavinZhao @LAKostis, as @pravinjagtap mentioned, the issue should be fixed now. Can you try the installing the latest version of ROCm and confirm on your system so the issue can be closed?

I was able to reproduce the issue on ROCm 6.0 by following the steps @GZGavinZhao mentioned. My system is running Ubuntu 20.04 and has 3 MI210s (gfx90a).

Image

I also tried ROCm 6.3.0 and was able to render successfully:

Image

@GZGavinZhao
Copy link
Author

gfx1030 is fixed. gfx90c still broken with the same error Memory access fault by GPU node-2 (Agent handle: 0x7f6cd82d3900) on address 0x90000000000. Reason: Unknown., but I'm not sure whether that's still caused by this issue.

@sohaibnd
Copy link

gfx90c is not officially supported on ROCm, is this issue still present on any of the supported GPUs?

@GZGavinZhao
Copy link
Author

Not to my knowledge then.

@VencaCZ
Copy link

VencaCZ commented Mar 11, 2025

Hello this issue is back with 9070 so it should be probably reopened.

System:
Kernel: 6.8.0-55-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
clocksource: tsc
Desktop: i3 v: 4.23 with: i3bar tools: xss-lock vt: 7 dm: LightDM
v: 1.30.0 Distro: Linux Mint 22.1 Xia base: Ubuntu 24.04 noble
Machine:
Type: Desktop Mobo: Micro-Star model: B550-A PRO (MS-7C56) v: 2.0
serial: uuid: UEFI: American
Megatrends LLC. v: A.80 date: 12/16/2021
CPU:
Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP smt: enabled
arch: Zen 3+ rev: 2 cache: L1: 768 KiB L2: 6 MiB L3: 64 MiB
Speed (MHz): avg: 2287 high: 3700 min/max: 2200/4950 boost: enabled cores:
1: 2200 2: 2200 3: 2200 4: 2200 5: 2200 6: 3700 7: 2200 8: 2200 9: 2200
10: 2200 11: 2200 12: 2200 13: 2200 14: 2200 15: 2200 16: 2200 17: 2200
18: 2200 19: 2200 20: 2200 21: 2200 22: 2200 23: 2200 24: 2800
bogomips: 177608
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
Graphics:
Device-1: AMD Navi 48 [RX 9070/9070 XT] vendor: Sapphire driver: amdgpu
v: 6.10.5 pcie: speed: 32 GT/s lanes: 16 ports: active: DP-2 empty: DP-1,
HDMI-A-1, HDMI-A-2, Writeback-1 bus-ID: 2d:00.0 chip-ID: 1002:7550
class-ID: 0300
Display: x11 server: X.Org v: 21.1.11 with: Xwayland v: 23.2.6 driver: X:
loaded: amdgpu unloaded: fbdev,modesetting,radeon,vesa dri: radeonsi
gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 3840x2160 s-dpi: 96 s-size: 1016x571mm (40.00x22.48")
s-diag: 1165mm (45.88")
Monitor-1: DP-2 mapped: DisplayPort-1 model: LG (GoldStar) HDR 4K
serial: res: 3840x2160 hz: 60 dpi: 163
size: 600x340mm (23.62x13.39") diag: 690mm (27.2") modes: max: 3840x2160
min: 640x480
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi x11:
drv: radeonsi inactive: wayland
API: OpenGL v: 4.6 compat-v: 3.3 vendor: amd mesa v: 24.3.0-devel
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 9070 (radeonsi
gfx1201 LLVM 19.1.2 DRM 3.59 6.8.0-55-generic) device-ID: 1002:7550
Audio:
Device-1: AMD driver: snd_hda_intel v: kernel pcie: speed: 32 GT/s lanes: 16
bus-ID: 2d:00.1 chip-ID: 1002:ab40 class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel pcie: speed: 16 GT/s lanes: 16
bus-ID: 2f:00.4 chip-ID: 1022:1487 class-ID: 0403
API: ALSA v: k6.8.0-55-generic status: kernel-api
Server-1: PipeWire v: 1.0.5 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
Network:
Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
vendor: Micro-Star MSI RTL8111/8168/8411 driver: r8169 v: kernel pcie:
speed: 2.5 GT/s lanes: 1 port: f000 bus-ID: 2a:00.0 chip-ID: 10ec:8168
class-ID: 0200
IF: enp42s0 state: up speed: 1000 Mbps duplex: full mac:
Device-2: TP-Link 802.11ac WLAN Adapter driver: N/A type: USB rev: 2.1
speed: 480 Mb/s lanes: 1 bus-ID: 1-2.4:4 chip-ID: 2357:011f class-ID: 0000
serial:
IF-ID-1: docker0 state: up speed: 10000 Mbps duplex: unknown mac:
IF-ID-2: vethea5f7a4 state: up speed: 10000 Mbps duplex: full
mac:
Drives:
Local Storage: total: 5.39 TiB used: 1.74 TiB (32.3%)
ID-1: /dev/nvme0n1 vendor: Crucial model: CT2000P3PSSD8 size: 1.82 TiB
speed: 63.2 Gb/s lanes: 4 tech: SSD serial: fw-rev: P9CR420
temp: 29.9 C scheme: GPT
ID-2: /dev/nvme1n1 vendor: Crucial model: CT2000P3PSSD8 size: 1.82 TiB
speed: 63.2 Gb/s lanes: 4 tech: SSD serial: fw-rev: P9CR420
temp: 31.9 C
ID-3: /dev/sda vendor: Kingston model: SA400S37960G size: 894.25 GiB
speed: 6.0 Gb/s tech: SSD serial: fw-rev: Z1.3
ID-4: /dev/sdb vendor: Kingston model: SA400S37960G size: 894.25 GiB
speed: 6.0 Gb/s tech: SSD serial: fw-rev: Z1.3
Partition:
ID-1: / size: 1.79 TiB used: 227.8 GiB (12.4%) fs: ext4 dev: /dev/nvme0n1p2
ID-2: /boot/efi size: 511 MiB used: 6.1 MiB (1.2%) fs: vfat
dev: /dev/nvme0n1p1
Swap:
ID-1: swap-1 type: file size: 2 GiB used: 0 KiB (0.0%) priority: -2
file: /swapfile
USB:
Hub-1: 1-0:1 info: hi-speed hub with single TT ports: 10 rev: 2.0
speed: 480 Mb/s lanes: 1 chip-ID: 1d6b:0002 class-ID: 0900
Hub-2: 1-2:2 info: Genesys Logic Hub ports: 4 rev: 2.0 speed: 480 Mb/s
lanes: 1 power: 100mA chip-ID: 05e3:0608 class-ID: 0900
Device-1: 1-2.4:4 info: TP-Link 802.11ac WLAN Adapter type: WiFi
driver: N/A interfaces: 1 rev: 2.1 speed: 480 Mb/s lanes: 1 power: 500mA
chip-ID: 2357:011f class-ID: 0000 serial:
Device-2: 1-7:3 info: Micro Star MYSTIC LIGHT type: HID
driver: hid-generic,usbhid interfaces: 1 rev: 1.1 speed: 12 Mb/s lanes: 1
power: 500mA chip-ID: 1462:7c56 class-ID: 0300 serial:
Hub-3: 2-0:1 info: super-speed hub ports: 4 rev: 3.1 speed: 10 Gb/s
lanes: 1 chip-ID: 1d6b:0003 class-ID: 0900
Hub-4: 3-0:1 info: hi-speed hub with single TT ports: 4 rev: 2.0
speed: 480 Mb/s lanes: 1 chip-ID: 1d6b:0002 class-ID: 0900
Hub-5: 4-0:1 info: super-speed hub ports: 4 rev: 3.1 speed: 10 Gb/s
lanes: 1 chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
System Temperatures: cpu: 35.1 C mobo: N/A gpu: amdgpu temp: 30.0 C
mem: 54.0 C
Fan Speeds (rpm): N/A
Repos:
Packages: pm: dpkg pkgs: 2748
No active apt repos in: /etc/apt/sources.list
No active apt repos in: /etc/apt/sources.list.d/amdgpu-proprietary.list
Active apt repos in: /etc/apt/sources.list.d/amdgpu.list
1: deb [arch=amd64,i386 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/6.3.4/ubuntu jammy main
Active apt repos in: /etc/apt/sources.list.d/apt-build.list
1: deb [trusted=yes] file:/var/cache/apt-build/repository apt-build main
Active apt repos in: /etc/apt/sources.list.d/cappelikan-ppa-noble.list
1: deb [signed-by=/etc/apt/keyrings/cappelikan-ppa-noble.gpg] https://ppa.launchpadcontent.net/cappelikan/ppa/ubuntu noble main
Active apt repos in: /etc/apt/sources.list.d/kisak-kisak-mesa-noble.list
1: deb [signed-by=/etc/apt/keyrings/kisak-kisak-mesa-noble.gpg] https://ppa.launchpadcontent.net/kisak/kisak-mesa/ubuntu noble main
Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list
1: deb http://packages.linuxmint.com xia main upstream import backport
2: deb http://archive.ubuntu.com/ubuntu noble main restricted universe multiverse
3: deb http://archive.ubuntu.com/ubuntu noble-updates main restricted universe multiverse
4: deb http://archive.ubuntu.com/ubuntu noble-backports main restricted universe multiverse
5: deb http://security.ubuntu.com/ubuntu/ noble-security main restricted universe multiverse
Active apt repos in: /etc/apt/sources.list.d/rocm.list
1: deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.3.4 jammy main
Info:
Memory: total: 64 GiB note: est. available: 62.72 GiB used: 4.34 GiB (6.9%)
Processes: 732 Power: uptime: 11m states: freeze,mem,disk suspend: deep
wakeups: 0 hibernate: platform Init: systemd v: 255 target: graphical (5)
default: graphical
Compilers: gcc: 13.3.0 alt: 14 Shell: Bash v: 5.2.21
running-in: gnome-terminal inxi: 3.3.34

Memory access fault by GPU node-1 (Agent handle: 0x730378a20c00) on address 0x72fd8d363000. Reason: Page not present or supervisor privilege.

This is present in blender, but in other apps as well

I was able to reproduce this with

  • Latest blender
  • Latest blender (beta)
  • llama.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generic Build error, or some other issue not caused by an LLVM bug Under Investigation
Projects
None yet
Development

No branches or pull requests

6 participants