Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug for 12.1 #248

Open
Haydnspass opened this issue Jun 22, 2023 · 1 comment
Open

Bug for 12.1 #248

Haydnspass opened this issue Jun 22, 2023 · 1 comment

Comments

@Haydnspass
Copy link

Hey,

since I changed to CUDA 12.1 I sometimes get the following error on some runs. It could be unrelated to 12.1 but I noticed it now first.

Run Jimver/cuda-toolkit@v0.2.10
  with:
    cuda: 1[2](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:2).1.0
    sub-packages: []
    method: local
    linux-local-args: ["--toolkit", "--samples"]
    use-github-cache: true
  env:
    INPUT_RUN_POST: true
    CONDA: /usr/share/miniconda[3](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:3)
    CONDA_PKGS_DIR: /home/runner/conda_pkgs_dir
/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/SplinePSF/SplinePSF --files-from manifest.txt --use-compress-program zstdmt
Failed to save: Unable to reserve cache with key cuda_installer-linux-5.15.0-10[4](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:4)0-azure-12.1.0, another job may be creating this cache. More details: Cache already exists. Scope: refs/heads/dev_fix_hopper_ci, Key: cuda_installer-linux-[5](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:5).15.0-1040-azure-12.1.0, Version: 4bfd4[6](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:6)a3233f39e[7](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:7)afb92a41e0e6a5d43d677cf4e3f9feca[8](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:8)11a22308a54230c
/usr/bin/sudo /opt/hostedtoolcache/cuda_installer-linux/12.1.0/x64/cuda_installer-linux-5.15.0-1040-azure_12.1.0.run --silent --toolkit --samples
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::copy_file: No such file or directory: "./builds/cuda_cupti/extras/CUPTI/doc/Cupti/structCUpti__ActivityMemcpy3.html", "/usr/local/cuda-12.1/extras/CUPTI/doc/Cupti/structCUpti__ActivityMemcpy3.html"
Aborted (core dumped)
Warning: Error during installation: Error: The process '/usr/bin/sudo' failed with exit code 134
Starting artifact upload
For more detailed logs during the artifact upload process, enable step-debugging: https://docs.github.com/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging
Artifact name is valid!
Container for artifact "install-log" successfully created. Starting upload of file(s)
Total size of all the files uploaded is 10[9](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:9)13 bytes
File upload process has finished. Finalizing the artifact upload
Artifact has been finalized. All files have been successfully uploaded!

The raw size of all the files that were specified for upload is 172032 bytes
The size of all the files that were uploaded is [10](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:10)9[13](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:14) bytes. This takes into account any gzip compression used to reduce the upload size, time and storage

Note: The size of downloaded zips can differ significantly from the reported size. For more information see: https://github.com/actions/upload-artifact#zipped-artifact-downloads 

Error: Error: The process '/usr/bin/sudo' failed with exit code 134

any idea what to conclude from that? The workflow for this run is here: https://github.com/Haydnspass/SplinePSF/blob/0ab69d9722f0abce3de11947a5bccf4946341ba9/.github/workflows/build_upload_test.yaml

@LLukas22
Copy link

LLukas22 commented Jul 8, 2023

You can resolve this by using the network method and only installing the packages you actually need. This also makes the installation a lot faster. (For all available packages see e.g. https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants