Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the retry attempts, error conditions, and IPv6 attempts for fetching ec2 metadata from IMDS #314427

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 97 additions & 20 deletions nixos/modules/virtualisation/ec2-metadata-fetcher.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,91 @@ metaDir=/etc/ec2-metadata
mkdir -m 0755 -p "$metaDir"
rm -f "$metaDir/*"

# See: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-v2-how-it-works.html
imds_ipv4_addr="169.254.169.254"
imds_ipv6_addr="[fd00:ec2::254]"

# modified by get_imds_token()
IMDS_USE_IPV6=false

# formats the URL for the IMDSv2 service
# based on connectivity to ipv4 vs ipv6
imds_url() {
# replace path's leading slash with nothing
path=$(echo "$@" | sed 's|^/||')

if [ "$IMDS_USE_IPV6" = true ]; then
echo "http://${imds_ipv6_addr}/${path}"
else
echo "http://${imds_ipv4_addr}/${path}"
fi
}

# tests for both ipv4 and ipv6 connectivity to IMDSv2
# then retrieves a token and configures the rest of
# the script to use whichever address was available
get_imds_token() {
# retry-delay of 1 selected to give the system a second to get going,
# but not add a lot to the bootup time
curl \
--silent \
--show-error \
--retry 3 \
--retry-delay 1 \
--fail \
-X PUT \
--connect-timeout 1 \
-H "X-aws-ec2-metadata-token-ttl-seconds: 600" \
http://169.254.169.254/latest/api/token
# use a longer retry time to test for both ipv4 and ipv6 connectivity
token=""

# first test ipv4
token=$(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a 10 second delay to booting machines in an IPv6 subnet. I don't think that is desirable.

I think we need to rethink this script instead of patching this in. This is getting very complex

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. While implementing this I think there should be a separate trigger or method for identifying which connectivity is preferred, then use that connectivity explicitly.

I'm curious if there's a way we could query the interfaces attached and see if any ipv6 address is available, if so, then we try that first.

But I still think it's odd that dhcp grants ipv6 before ipv4...

What ideas do you have for this script? I'm happy to help implement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I still think it's odd that dhcp grants ipv6 before ipv4...

It's just a race condition. Sometimes you get the IPv4 lease sometimes the IPv6 lease first

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script could try ipv4/6 back and forth until it finds a working token rather than trying all at once. i.e. we move the retry logic out of curl's opts and into the bash function.

curl \
--silent \
--globoff \
--show-error \
--retry 10 \
--retry-delay 1 \
--retry-connrefused \
--fail \
-X PUT \
--connect-timeout 1 \
-H "X-aws-ec2-metadata-token-ttl-seconds: 600" \
http://$imds_ipv4_addr/latest/api/token
)

if [ "x$token" == "x" ]; then
# ipv4 failed, try ipv6
IMDS_USE_IPV6=true
token=$(
curl \
--silent \
--globoff \
--show-error \
--retry 10 \
--retry-delay 1 \
--retry-connrefused \
--fail \
-X PUT \
--connect-timeout 1 \
-H "X-aws-ec2-metadata-token-ttl-seconds: 600" \
http://$imds_ipv6_addr/latest/api/token
)
fi

if [ "x$token" == "x" ]; then
# indicate failure
return 1
fi

echo "$token"
}

preflight_imds_token() {
# retry-delay of 1 selected to give the system a second to get going,
# but not add a lot to the bootup time
curl \
--silent \
--globoff \
--show-error \
--retry 3 \
--retry 5 \
--retry-delay 1 \
--retry-connrefused \
--fail \
--connect-timeout 1 \
-H "X-aws-ec2-metadata-token: $IMDS_TOKEN" \
-o /dev/null \
http://169.254.169.254/1.0/meta-data/instance-id
$(imds_url /latest/meta-data/instance-id)
}

try=1
Expand All @@ -42,25 +99,45 @@ done

if [ "x$IMDS_TOKEN" == "x" ]; then
echo "failed to fetch an IMDS2v token."
exit 1
fi

try=1
last_exit_code=""
while [ $try -le 10 ]; do
echo "(attempt $try/10) validating the EC2 instance metadata service v2 token..."
preflight_imds_token && break
preflight_imds_token
last_exit_code="$?"
if [ "$last_exit_code" -eq 0 ]; then
break
fi
try=$((try + 1))
sleep 1
done

if [ "$last_exit_code" -ne 0 ]; then
echo "failed to validate the IMDS2v token."
exit 1
fi

echo "getting EC2 instance metadata..."

get_imds() {
# --fail to avoid populating missing files with 404 HTML response body
# || true to allow the script to continue even when encountering a 404
curl --silent --show-error --fail --header "X-aws-ec2-metadata-token: $IMDS_TOKEN" "$@" || true
curl \
--silent \
--globoff \
--show-error \
--retry 3 \
--retry-delay 1 \
--retry-connrefused \
--fail \
--header "X-aws-ec2-metadata-token: $IMDS_TOKEN" \
"$@" || true
}

get_imds -o "$metaDir/ami-manifest-path" http://169.254.169.254/1.0/meta-data/ami-manifest-path
(umask 077 && get_imds -o "$metaDir/user-data" http://169.254.169.254/1.0/user-data)
get_imds -o "$metaDir/hostname" http://169.254.169.254/1.0/meta-data/hostname
get_imds -o "$metaDir/public-keys-0-openssh-key" http://169.254.169.254/1.0/meta-data/public-keys/0/openssh-key
get_imds -o "$metaDir/ami-manifest-path" $(imds_url /latest/meta-data/ami-manifest-path)
(umask 077 && get_imds -o "$metaDir/user-data" $(imds_url /latest/user-data))
get_imds -o "$metaDir/hostname" $(imds_url /latest/meta-data/hostname)
get_imds -o "$metaDir/public-keys-0-openssh-key" $(imds_url /latest/meta-data/public-keys/0/openssh-key)