Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: automatically get updated apt keys via CSE #2022

Closed
wants to merge 4 commits into from

Conversation

jackfrancis
Copy link
Member

Reason for Change:

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys can fix many NO_PUBKEY errors preventing apt operations from running.

This PR delivers such a programmatic fix via CSE during cluster bootstrapping.

Issue Fixed:

Requirements:

Notes:

@acs-bot acs-bot added the size/M label Sep 24, 2019
@jackfrancis
Copy link
Member Author

Tested this manually:

root@k8s-agent1-42830320-vmss000001:/home/azureuser# source test.sh
root@k8s-agent1-42830320-vmss000001:/home/azureuser# apt_fix_keys
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Some index files failed to download. They have been ignored, or old ones used instead.
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Failed to fetch https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6ED91CA3AC1160CD
W: Some index files failed to download. They have been ignored, or old ones used instead.
Executing: /tmp/tmp.MF4jaKvpog/gpg.1.sh --keyserver
keyserver.ubuntu.com
--recv-keys
6ED91CA3AC1160CD
gpg: requesting key AC1160CD from hkp server keyserver.ubuntu.com
gpg: key F796ECB0: "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>" 1 new signature
gpg: key F796ECB0: "NVIDIA CORPORATION (Open Source Projects) <cudatools@nvidia.com>" 1 new subkey
gpg: Total number processed: 1
gpg:            new subkeys: 1
gpg:         new signatures: 1
Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Hit:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease
Hit:7 https://packages.microsoft.com/ubuntu/16.04/prod xenial InRelease
Hit:8 http://security.ubuntu.com/ubuntu xenial-security InRelease
Reading package lists...
Executed apt-get update NO_PUBKEY fix 2 times
root@k8s-agent1-42830320-vmss000001:/home/azureuser# apt-get update
Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Hit:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease
Hit:4 https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64  InRelease 
Hit:5 https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  InRelease
Hit:6 https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  InRelease       
Hit:7 https://packages.microsoft.com/ubuntu/16.04/prod xenial InRelease         
Hit:8 http://security.ubuntu.com/ubuntu xenial-security InRelease
Reading package lists... Done

@acs-bot
Copy link

acs-bot commented Sep 24, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

output=/tmp/apt-fix-keys.out
for i in $(seq 1 $retries); do
wait_for_apt_locks
! (apt-get update | tee $output | grep NO_PUBKEY) && \
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do

for K in $(apt-key list | grep expired | cut -d'/' -f2 | cut -d' ' -f1); do sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com $K; done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running two extra apt-get update operations in every single CSE (even with VHD) would be a de-optimization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current nvidia scenario does not show up in apt-key list | grep expired. It's only when you run apt-get update that you are able to derive the key that needs fixing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case should we only do the update in nvidia node? I think doing an update for everyone will take a hit on provisioning latency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's attempting the apt-get update in the background (and it's best-effort, it won't error out), so it won't affect provisioning latency

@@ -18,6 +18,8 @@ done
sed -i "/#HELPERSEOF/d" $script_lib
source $script_lib

apt_fix_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure this is intentionally best-effort to allow for no outbound scenarios

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent it to the background, which should be 100% sane, as other apt operations will wait for locks that this func holds, and retry if they run during key remediation.

In terms of no outbound scenarios, should we simply skip this operation for those scenarios? Does CSE know when we're in a no outbound context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AKS Engine, yes (via the no outbound feature flag, which we can pass into cse if not already doing so). For AKS, I don't think this is part of the limited egress required addresses https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic#required-ports-and-addresses-for-aks-clusters

@codecov
Copy link

codecov bot commented Sep 24, 2019

Codecov Report

Merging #2022 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #2022   +/-   ##
======================================
  Coverage    76.7%   76.7%           
======================================
  Files         135     135           
  Lines       20547   20547           
======================================
  Hits        15761   15761           
  Misses       3871    3871           
  Partials      915     915

wait_for_apt_locks
! (apt-get update | tee $output | grep NO_PUBKEY) && \
cat $output && break || \
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $(apt-get update | grep NO_PUBKEY -m 1 | awk -F "NO_PUBKEY" '{print $2}')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palma21 Can we add keyserver.ubuntu.com to the "no egress" whitelist? This URL is used to remotely retrieve expired/missing keys in the apt repo configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a cluster with no egress lock down, I assume it doesn't need apt operation, why we need this whitelist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is there are apt sources in the whitelist to allow no egress clusters to update themselves (kernel, azsecpack, etc).

If not, then AKS Engine is definitely delivering an apt configuration that doesn't make sense in the no egress scenario.

@@ -57,6 +57,8 @@ if [[ "${GPU_NODE}" != "true" ]]; then
cleanUpGPUDrivers
fi

apt_fix_keys &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangl900 my assumption is that in the expired GPU key context, this would only get called for GPU nodes since cleanUpGPUDrivers would have already taken care of removing the nvidia apt source by the time we get to apt_fix_keys.

@jackfrancis were you able to verify that that's true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@stale
Copy link

stale bot commented Oct 24, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 24, 2019
@stale stale bot closed this Oct 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants