Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocprofv2: aqlprofile API table load failed #262615

Closed
Tungsten842 opened this issue Oct 21, 2023 · 6 comments · Fixed by #262798
Closed

rocprofv2: aqlprofile API table load failed #262615

Tungsten842 opened this issue Oct 21, 2023 · 6 comments · Fixed by #262798
Labels
0.kind: bug Something is broken

Comments

@Tungsten842
Copy link
Member

Describe the bug

When trying to use rocprofv2:

❯ rocprofv2 ./a.out 10
aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.
/nix/store/c5iy8c6w34y4jfgqlv8c6scxrilc4djv-rocprofiler-5.7.0/bin/rocprofv2: line 264: 77294 Aborted                 (core dumped) LD_PRELOAD=$LD_PRELOAD:$ROCM_DIR/lib/rocprofiler/librocprofiler_tool.so $*

This might have something to do with some predefined paths in rocprofv2

Steps To Reproduce

Steps to reproduce the behavior:

  1. Compile an executable with hipcc
  2. Open it with rocprofv2

Notify maintainers

@NixOS/rocm-maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.5.7, NixOS, 23.11 (Tapir), 23.11.20231016.ca012a0`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.17.0`
 - nixpkgs: `/run/current-system/nixpkgs`
@Tungsten842 Tungsten842 added the 0.kind: bug Something is broken label Oct 21, 2023
@Madouura
Copy link
Contributor

Madouura commented Oct 22, 2023

I need to do some work with rocprofiler.

  1. I need to make a patch that strips hsa-amd-aqlprofile-bin.
  2. Apparently I need to fix rocprofiler itself.

I wonder if it's because I'm stripping out the test directories?
Also yes, it looks like you're right. You can fix that by prepending ROCM_DIR=/nix/store/rocprofiler for now. Although it seems to need hsa-amd-aqlprofile in ROCM_DIR too, so I'm not entirely sure it'll work.
Try setting ROCM_DIR to rocmPackages.meta.rocm-developer-tools. You may just be able to run it from that as well.

@Tungsten842
Copy link
Member Author

I need to do some work with rocprofiler.

1. I need to make a patch that strips `hsa-amd-aqlprofile-bin`.

2. Apparently I need to fix `rocprofiler` itself.

I wonder if it's because I'm stripping out the test directories? Also yes, it looks like you're right. You can fix that by appending ROCM_DIR=/nix/store/rocprofiler for now. Although it seems to need hsa-amd-aqlprofile in ROCM_DIR too, so I'm not entirely sure it'll work. Try setting ROCM_DIR to rocmPackages.meta.rocm-developer-tools. You may just be able to run it from that as well.

I don't think that removing hsa-amd-aqlprofile-bin is a good idea, because without it, it loses performance counters support. I think there should at least be a rocprof derivation with aqlprofile.

@Madouura
Copy link
Contributor

My bad, I forgot to add that the patch is optional.

@Madouura
Copy link
Contributor

Right.
Could you tell me your test environment and the source of the hip executable (or just some stub) so I can replicate this?
I've never actually used rocprofiler.

@Tungsten842
Copy link
Member Author

Tungsten842 commented Oct 22, 2023

Right. Could you tell me your test environment and the source of the hip executable (or just some stub) so I can replicate this? I've never actually used rocprofiler.

You can try this, it just prints 10 numbers, it can be compiled with hipcc without any flag. I am using a radeon rx6600m(gfx1032).

#include <hip/amd_detail/amd_hip_runtime.h>
#include <stdio.h>

__global__ void gene(int *array) {
  int tid = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;
  array[tid] = tid;
}

int main() {
  int device;
  hipDeviceProp_t prop;
  hipGetDevice(&device);
  hipGetDeviceProperties(&prop, device);
  printf("DEVICE: %s\n", prop.name);
  int thread_per_block = prop.maxThreadsPerBlock;
  printf("Max thread per block: %d\n", thread_per_block);

  int n_elem = 10;
  int *d_array, *h_array;

  int array_size = n_elem * sizeof(int);

  hipMalloc(&d_array, array_size);

  int numblocks = (n_elem + thread_per_block - 1) / thread_per_block;

  gene<<<numblocks, thread_per_block>>>(d_array);

  hipDeviceSynchronize();
  hipHostMalloc(&h_array, array_size);
  hipMemcpy(h_array, d_array, array_size, hipMemcpyDeviceToHost);
  hipDeviceSynchronize();

  for (int i = 0; i < n_elem; i++) {
    printf("%d\n", h_array[i]);
  }

  hipFree(d_array);
  hipFree(h_array);

  return 0;
}

@Madouura
Copy link
Contributor

@Tungsten842 see #262798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants