-
Notifications
You must be signed in to change notification settings - Fork 519
fix: automatically get updated apt keys via CSE #2022
Conversation
Tested this manually:
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
output=/tmp/apt-fix-keys.out | ||
for i in $(seq 1 $retries); do | ||
wait_for_apt_locks | ||
! (apt-get update | tee $output | grep NO_PUBKEY) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do
for K in $(apt-key list | grep expired | cut -d'/' -f2 | cut -d' ' -f1); do sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com $K; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running two extra apt-get update
operations in every single CSE (even with VHD) would be a de-optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current nvidia scenario does not show up in apt-key list | grep expired
. It's only when you run apt-get update that you are able to derive the key that needs fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case should we only do the update in nvidia node? I think doing an update for everyone will take a hit on provisioning latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's attempting the apt-get update in the background (and it's best-effort, it won't error out), so it won't affect provisioning latency
@@ -18,6 +18,8 @@ done | |||
sed -i "/#HELPERSEOF/d" $script_lib | |||
source $script_lib | |||
|
|||
apt_fix_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure this is intentionally best-effort to allow for no outbound scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent it to the background, which should be 100% sane, as other apt operations will wait for locks that this func holds, and retry if they run during key remediation.
In terms of no outbound scenarios, should we simply skip this operation for those scenarios? Does CSE know when we're in a no outbound context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AKS Engine, yes (via the no outbound feature flag, which we can pass into cse if not already doing so). For AKS, I don't think this is part of the limited egress required addresses https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic#required-ports-and-addresses-for-aks-clusters
Codecov Report
@@ Coverage Diff @@
## master #2022 +/- ##
======================================
Coverage 76.7% 76.7%
======================================
Files 135 135
Lines 20547 20547
======================================
Hits 15761 15761
Misses 3871 3871
Partials 915 915 |
wait_for_apt_locks | ||
! (apt-get update | tee $output | grep NO_PUBKEY) && \ | ||
cat $output && break || \ | ||
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $(apt-get update | grep NO_PUBKEY -m 1 | awk -F "NO_PUBKEY" '{print $2}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palma21 Can we add keyserver.ubuntu.com
to the "no egress" whitelist? This URL is used to remotely retrieve expired/missing keys in the apt repo configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have a cluster with no egress lock down, I assume it doesn't need apt operation, why we need this whitelist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is there are apt sources in the whitelist to allow no egress clusters to update themselves (kernel, azsecpack, etc).
If not, then AKS Engine is definitely delivering an apt configuration that doesn't make sense in the no egress scenario.
@@ -57,6 +57,8 @@ if [[ "${GPU_NODE}" != "true" ]]; then | |||
cleanUpGPUDrivers | |||
fi | |||
|
|||
apt_fix_keys & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yangl900 my assumption is that in the expired GPU key context, this would only get called for GPU nodes since cleanUpGPUDrivers
would have already taken care of removing the nvidia apt source by the time we get to apt_fix_keys
.
@jackfrancis were you able to verify that that's true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Reason for Change:
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys
can fix manyNO_PUBKEY
errors preventing apt operations from running.This PR delivers such a programmatic fix via CSE during cluster bootstrapping.
Issue Fixed:
Requirements:
Notes: