Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation - how to find more detailed information when local repo downloads are failing #2384

Open
jshawdell opened this issue Nov 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jshawdell
Copy link

Describe the bug
We ran into a problem when initializing local repo - there were 8 packages that failed to download (from dockerhub in our case).

  • but we could not find any additional detail to determine why the failure occurred.

The failures were seen in the ansible output and in the download_package_status.csv, but without a reason for the failure:

	a. Failures listed in Ansible output:

TASK [parse_and_download : Display Failed Packages] *************************************************************************************************************
Tuesday 15 October 2024 23:00:59 +0000 (0:00:00.922) 0:41:14.752 *******
failed: [localhost] (item={'package': 'docker.io/library/nginx:1.25.2-alpine', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/library/nginx:1.25.2-alpine", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/library/nginx:1.25.2-alpine", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/rocm/k8s-device-plugin:latest', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/rocm/k8s-device-plugin:latest", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/rocm/k8s-device-plugin:latest", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/mpioperator/mpi-operator:master', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/mpioperator/mpi-operator:master", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/mpioperator/mpi-operator:master", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/roman8rcm/roce-test:229.2.32.0', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/roman8rcm/roce-test:229.2.32.0", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/roman8rcm/roce-test:229.2.32.0", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/traefik:v2.10.5', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/traefik:v2.10.5", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/traefik:v2.10.5", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/tensorflow/tensorflow:latest', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/tensorflow/tensorflow:latest", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/tensorflow/tensorflow:latest", "status": "Failed", "type": "image"}}
failed: [localhost] (item={'package': 'docker.io/rocm/tensorflow:latest', 'type': 'image', 'status': 'Failed'}) => {"ansible_loop_var": "item", "changed": false, "item": {"package": "docker.io/rocm/tensorflow:latest", "status": "Failed", "type": "image"}, "msg": {"package": "docker.io/rocm/tensorflow:latest", "status": "Failed", "type": "image"}}

            b.  failures seen in the download_package_status.csv:

root@omnia-cp:/opt/omnia/offline# grep -i failed download_package_status.csv
docker.io/library/nginx:1.25.2-alpine,image,Failed
docker.io/rocm/k8s-device-plugin:latest,image,Failed
docker.io/mpioperator/mpi-operator:master,image,Failed
docker.io/roman8rcm/roce-test:229.2.32.0,image,Failed
docker.io/traefik:v2.10.5,image,Failed
docker.io/tensorflow/tensorflow:latest,image,Failed
docker.io/rocm/tensorflow:latest,image,Failed
root@omnia-cp:/opt/omnia/offline#

** This suggestion is to add more detail to the install documentation to help the user triage why packages failed to download.

  • is there a logfile or journalctl output that can provide more detail?

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@jshawdell jshawdell added the bug Something isn't working label Nov 25, 2024
@abhishek-sa1
Copy link
Contributor

@jshawdell The local repo failure would have happened due to either internet connectivity or docker pull issue. FAQ links - https://omniahpc.readthedocs.io/en/release_1.7/Troubleshooting/FAQ/Common/LocalRepo.html.

There will be enhancement for this feature in upcoming omnia releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants