Skip to content
This repository was archived by the owner on Jan 22, 2024. It is now read-only.

Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt index /Packages file corruption #1402

Closed
acarrillo opened this issue Oct 19, 2020 · 14 comments

Comments

@acarrillo
Copy link

acarrillo commented Oct 19, 2020

1. Issue or feature description

Attempting to apt-get update in the latest nvidia/cuda image for CUDA 10.1 / cudnn7 / ubuntu16.04 produces the following failure:

Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.

Possibly, this is due to corruption of the nvidia apt index itself, like its /Packages file, since the apt index is reporting recent updates to it timestamp wise (2020-10-19 19:03)

2. Steps to reproduce the issue

# Latest image for me is `sha256:59179dcc823e4dda86b3d780c165ad0eed5559bbcc08950c7973b68775a32ed2`
$ docker pull nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04
$ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update

Full output:

acarrillo ~/code/farmwise_main/docs $ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  InRelease
Get:3 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release [697 B]
Get:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release [564 B]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release.gpg [836 B]
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release.gpg [833 B]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages                                     
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages [382 kB]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Packages [97.2 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1835 kB]                                  
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]         
Get:13 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]         
Get:14 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]        
Get:15 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [15.9 kB]
Get:16 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [951 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]           
Get:18 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]              
Get:19 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [9249 B] 
Get:20 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [2353 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [16.4 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1497 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [26.7 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [10.9 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [12.6 kB]
Fetched 19.4 MB in 5s (3713 kB/s)                           
Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.

3. Information to attach (optional if deemed irrelevant)

@acarrillo acarrillo changed the title Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt-list corruption Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt list corruption Oct 19, 2020
@acarrillo acarrillo changed the title Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt list corruption Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt lists corruption Oct 19, 2020
@mwcondino
Copy link

I'm also seeing this issue, and am able to reproduce locally using @acarrillo 's steps. Here's my output:

matt@matt-thinkpad:~/farmwise_main$ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  InRelease
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release [697 B]
Get:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release [564 B]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release.gpg [836 B]
Get:7 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]                          
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release.gpg [833 B]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages                                     
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages [382 kB]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Packages [97.2 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1835 kB]                                
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]          
Get:13 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [15.9 kB]  
Get:14 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [951 kB]
Get:15 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [9249 B]
Get:16 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]     
Get:17 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [2353 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [16.4 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1497 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [26.7 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [10.9 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [12.6 kB]
Fetched 19.4 MB in 3s (5100 kB/s)                             
Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.

@jmchuster
Copy link

if i download the /Packages file, and i search for \n\n[^P], it looks like the last package definitions have been split with an extra newline after Installed-Size

Package: datacenter-gpu-manager
Priority: optional
Provides: datacenter-gpu-manager
Replaces: datacenter-gpu-manager, datacenter-gpu-manager-fabricmanager (<<2.0), datacenter-gpu-manager-dcp-nda-only, datacenter-gpu-manager-collectd, datacenter-gpu-manager-wsgi, datacenter-gpu-manager-fabricmanager-internal-api-header
Section: devel
Version: 1:2.0.13
Installed-Size: 370887

Filename: ./datacenter-gpu-manager_2.0.13_amd64.deb
Size: 184134550
MD5sum: 74c67c4a8477bcf808508fbb13fafea3
SHA1: 0590a85fbe357c434eb903e59cce0b3db2903620
SHA256: 74add19a8b6e3bd612c04690c366eff7a0eb3271e021a663939f3ff683aa2705
SHA512: d54529d72223544eba8bf2d05b18aac39ae741d66d8dfbb26093a61e0485229519577b1de37b55d0c7759612aef19acabd7622956bb0284531924695d6491677
Description: NVIDIA® Datacenter GPU Management Tools
 The Datacenter GPU Manager package contains tools for managing NVIDIA® GPUs in
 high performance and cluster computing environments.
 .
 This package also contains the DCGM GPU Diagnostic. DCGM GPU Diagnostic is the system
 administrator and cluster manager's tool for detecting and troubleshooting
 common problems affecting NVIDIA® Tesla GPUs.

Package: datacenter-gpu-manager
Priority: optional
Provides: datacenter-gpu-manager
Replaces: datacenter-gpu-manager, datacenter-gpu-manager-fabricmanager (<<2.0), datacenter-gpu-manager-dcp-nda-only, datacenter-gpu-manager-collectd, datacenter-gpu-manager-wsgi, datacenter-gpu-manager-fabricmanager-internal-api-header
Section: devel
Version: 1:2.0.13
Installed-Size: 370887

Filename: ./datacenter-gpu-manager_2.0.13_amd64.deb
Size: 184134550
MD5sum: 74c67c4a8477bcf808508fbb13fafea3
SHA1: 0590a85fbe357c434eb903e59cce0b3db2903620
SHA256: 74add19a8b6e3bd612c04690c366eff7a0eb3271e021a663939f3ff683aa2705
SHA512: d54529d72223544eba8bf2d05b18aac39ae741d66d8dfbb26093a61e0485229519577b1de37b55d0c7759612aef19acabd7622956bb0284531924695d6491677
Description: NVIDIA® Datacenter GPU Management Tools
 The Datacenter GPU Manager package contains tools for managing NVIDIA® GPUs in
 high performance and cluster computing environments.
 .
 This package also contains the DCGM GPU Diagnostic. DCGM GPU Diagnostic is the system
 administrator and cluster manager's tool for detecting and troubleshooting
 common problems affecting NVIDIA® Tesla GPUs.

@acarrillo acarrillo changed the title Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt lists corruption Latest nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 cannot apt-get update due to apt index /Packages file corruption Oct 19, 2020
@tgaddair
Copy link

@cliffwoolley can someone from the Nvidia side take a look? This is blocking Horovod CI.

@klueska
Copy link
Contributor

klueska commented Oct 19, 2020

This is not an issue for nvidia-docker itself.

All issues related to images running on top of nvidia-docker should be directed here:
https://forums.developer.nvidia.com/c/accelerated-computing/nvidia-gpu-cloud-ngc-users/25

@klueska klueska closed this as completed Oct 19, 2020
@acarrillo
Copy link
Author

acarrillo commented Oct 19, 2020

Ah, I think the community did not know that nvidia-docker and the nvidia apt maintainers do not talk to each other 😬

@tgaddair
Copy link

@klueska, is there an issue tracker for the correct team? A forum does not seem like the correct place to track breakages of this sort.

@acarrillo
Copy link
Author

Completely +1 to @tgaddair -- does nvidia have an issue tracking process for their package releases?

@klueska
Copy link
Contributor

klueska commented Oct 19, 2020

Let me double check if there is something better than the forum I linked nowadays.
Last time I checked, this was the recommended place to file issues for these images.

@dualvtable
Copy link
Contributor

We're working on the package repository and the issue is being addressed.

@klueska
Copy link
Contributor

klueska commented Oct 19, 2020

@acarrillo, @tgaddair

I got word that this is actually the better place for issues of this type (and it is well monitored):
https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-setup-and-installation/8

In fact, you can see this exact issue being discussed here:
https://forums.developer.nvidia.com/t/apt-update-failing-on-ubuntu-cuda-repo/140815/5

@dualvtable
Copy link
Contributor

dualvtable commented Oct 20, 2020

@acarrillo @tgaddair

The repository metadata has been fixed. However, if you were affected by the corrupt metadata, you would need to manually purge it from your system (not necessary in containers as the metadata would not be stored, but on bare-metal).

$ sudo rm -v /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos*
$ sudo apt-get update

@acarrillo
Copy link
Author

Noted, thank you for all of the information! I will route future concerns to that forum when they strictly pertain to nvidia/cuda core libs :)

@tgaddair
Copy link

Thanks @dualvtable, seems to be working now.

@blair2020
Copy link

Hello, I've met exactly the same problem with you now. Would you plz share your solution? Thx in advance!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants