1.21.0 Status 500 when PATCHing /settings?tx=bottlerocket-launch #4135

EthanKane-FD · 2024-08-09T14:46:22Z

Hey there, we noticed an issue today with the latest version of bottlerocket. Any help would be greatly appreciated. Our new builds picked up the latest version and our nodes are failing to boot.
Image I'm using:
Bottlerocket OS 1.21.0

What I expected to happen:

         Starting Bottlerocket userdata configuration system...

[  OK  ] Finished Bottlerocket userdata configuration system.

What actually happened:
Bottlerocket AMI updated last night to (Bottlerocket OS 1.21.0 (aws-k8s-1.30)!) bottlerocket userdata configuration is failing.

Seeing the following in the system logs

         Starting Bottlerocket userdata configuration system...

[    3.428743] early-boot-config[1329]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 500 when PATCHing /settings?tx=bottlerocket-launch: Error serializing Settings: 'unit' not allowed by Serializer
[FAILED] Failed to start Bottlerocket userdata configuration system.

See 'systemctl status early-boot-config.service' for details.

[DEPEND] Dependency failed for Bottlerocket initial configuration complete.

[DEPEND] Dependency failed for Isolates configured.target.

[DEPEND] Dependency failed for Applies settings to create config files.

[DEPEND] Dependency failed for Send signal to CloudFormation Stack.

[DEPEND] Dependency failed for Sets the hostname.

[DEPEND] Dependency failed for User-specified setting generators.

[DEPEND] Dependency failed for Generate additional settings for Kubernetes.

How to reproduce the problem:

Upgrade from 1.20.5

The text was updated successfully, but these errors were encountered:

patkinson01 · 2024-08-09T14:57:43Z

We've just seen issues on some of our clusters trying to update to 1.21.0 too - seems a similar issue so pasting here - but if not let me know and I'll raise a separate ticket:

    Starting Generate additional settings for Kubernetes...

[ 7.882539] pluto[1498]: Unable to retrieve cluster name and AWS region from Bottlerocket API: Deserialization of configuration file failed: invalid type: sequence, expected a string at line 16 column 18
[FAILED] Failed to start Generate additional settings for Kubernetes.

See 'systemctl status pluto.service' for details.

[DEPEND] Dependency failed for Applies settings to create config files.

[DEPEND] Dependency failed for Sets the hostname.

[DEPEND] Dependency failed for Send signal to CloudFormation Stack.

[DEPEND] Dependency failed for Bottlerocket initial configuration complete.

[DEPEND] Dependency failed for Isolates configured.target.

ramseymcgrathfd · 2024-08-09T15:01:02Z

Example launch template to reproduce

"image-gc-high-threshold-percent" = "${config.image_gc_high_threshold_percent}"
"image-gc-low-threshold-percent"  = "${config.image_gc_low_threshold_percent}"
"eviction-max-pod-grace-period"   = "${config.max_pod_grace_period}"

[settings.kubernetes.node-labels]
%{ for label_key, label_value in config.labels }
"${label_key}" = "${label_value}"
%{ endfor ~}

[settings.kubernetes.node-taints]
%{ for taint_key, taint_value in config.taints }
"${taint_key}" = "${taint_value}"
%{ endfor ~}

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "30m"
image-patterns = [
  "*.dkr.ecr.*.amazonaws.com"
]

[settings.kubernetes.eviction-hard]
%{ for key, value in config.eviction_hard_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.eviction-soft]
%{ for key, value in config.eviction_soft_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.eviction-soft-grace-period]
%{ for key, value in config.soft_grace_period_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.system-reserved]
cpu = "${config.system_reserved_cpu}"
memory = "${config.system_reserved_memory}"
ephemeral-storage = "${config.system_reserved_ephemeral}"

[settings.metrics]
# whether or not health metrics will be sent. set to false to opt-out
send-metrics = false

# Use local aws time server
[settings.ntp]
time-servers = ["169.254.169.123"]

# The admin host container provides SSH access and runs with "superpowers".
# It is disabled by default, but can be disabled explicitly.
[settings.host-containers.admin]
enabled = false

# The control host container provides out-of-band access via SSM.
# It is enabled by default, and can be disabled if you do not
# expect to use SSM. This could leave you with no way to access
# the API and change settings on an existing node!
[settings.host-containers.control]
enabled = true

yeazelm · 2024-08-09T15:09:33Z

Thank you @EthanKane-FD, @ramseymcgrathfd, and @patkinson01 for reporting this! We are looking at this now and will provide an update as soon as possible.

yeazelm · 2024-08-09T15:29:44Z

For folks that have seen this issue, if you can include the userdata to reproduce, similar to @ramseymcgrathfd, that would help a ton, if you don't want to post to GitHub but can open an AWS Support case and provide it there, that would help too.

EthanKane-FD · 2024-08-09T15:32:40Z

Hey @yeazelm, thanks for checking. Me and @ramseymcgrathfd are on the same team so that's our user data config.

patkinson01 · 2024-08-09T15:47:19Z

Hi @yeazelm, please find below our userdata:

`[settings.network]
no-proxy = ${no_proxy}
https-proxy = "${http_proxy}" # Squid Proxy with access to only specific approved domains

[[settings.container-registry.credentials]]
registry = "${repo_url}" # Internal repo where we pull all images from (except for some managed addons which need to come from AWS ECR repos)
username = "${repo_username}"
password = "${repo_api_key}"

[settings.kernel.sysctl]
"user.max_user_namespaces" = "0"
"vm.max_map_count" = "262144"
"net.ipv4.conf.all.send_redirects" = "0" #cis hardening 3.1.1
"net.ipv4.conf.default.send_redirects" = "0" #cis hardening 3.1.1
"net.ipv4.conf.all.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv4.conf.default.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv6.conf.all.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv6.conf.default.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv4.conf.all.secure_redirects" = "0" #cis hardening 3.2.3
"net.ipv4.conf.default.secure_redirects" = "0" #cis hardening 3.2.3
"net.ipv4.conf.all.log_martians" = "1" #cis hardening 3.2.4
"net.ipv4.conf.default.log_martians" = "1" #cis hardening 3.2.4

[settings.kubernetes.node-labels]
"bottlerocket.aws/updater-interface-version" = "2.0.0" # Configure the node-labels Bottlerocket setting to enable BruPop updates

[settings.bootstrap-containers.bottle]
source = "${repo_url}/${bottle_rocket_repo_name}/${bottle_rocket_image_name}:${bottle_rocket_image_version}"
mode = "once"
user-data = "${user_data}" #base64 encoded set of values used in our bottlerocket bootstrap image to configure Vault access and proxy
essential = true

[settings.updates]
ignore-waves = ${bottle_rocket_update_immediately}
seed = ${bottle_rocket_seed}

[settings.kubernetes]
api-server = "${cluster_endpoint}"
cluster-certificate = "${cluster_ca_base64}"
cluster-name = "${eks_cluster_id}"`

ytsssun · 2024-08-09T16:07:47Z

Example launch template to reproduce

"image-gc-high-threshold-percent" = "${config.image_gc_high_threshold_percent}"
"image-gc-low-threshold-percent"  = "${config.image_gc_low_threshold_percent}"
"eviction-max-pod-grace-period"   = "${config.max_pod_grace_period}"

[settings.kubernetes.node-labels]
%{ for label_key, label_value in config.labels }
"${label_key}" = "${label_value}"
%{ endfor ~}

[settings.kubernetes.node-taints]
%{ for taint_key, taint_value in config.taints }
"${taint_key}" = "${taint_value}"
%{ endfor ~}

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "30m"
image-patterns = [
  "*.dkr.ecr.*.amazonaws.com"
]

[settings.kubernetes.eviction-hard]
%{ for key, value in config.eviction_hard_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.eviction-soft]
%{ for key, value in config.eviction_soft_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.eviction-soft-grace-period]
%{ for key, value in config.soft_grace_period_values }
"${key}" = "${value}"
%{ endfor ~}

[settings.kubernetes.system-reserved]
cpu = "${config.system_reserved_cpu}"
memory = "${config.system_reserved_memory}"
ephemeral-storage = "${config.system_reserved_ephemeral}"

[settings.metrics]
# whether or not health metrics will be sent. set to false to opt-out
send-metrics = false

# Use local aws time server
[settings.ntp]
time-servers = ["169.254.169.123"]

# The admin host container provides SSH access and runs with "superpowers".
# It is disabled by default, but can be disabled explicitly.
[settings.host-containers.admin]
enabled = false

# The control host container provides out-of-band access via SSM.
# It is enabled by default, and can be disabled if you do not
# expect to use SSM. This could leave you with no way to access
# the API and change settings on an existing node!
[settings.host-containers.control]
enabled = true

@ramseymcgrathfd Do you by any chance have the rendered userdata? I tried apply some value to the template and failed to reproduce. Here is my userdata.

[settings.kubernetes]
"image-gc-high-threshold-percent" = 90
"image-gc-low-threshold-percent"  = 80
"eviction-max-pod-grace-period"   = 40

[settings.kubernetes.node-labels]
"name" = "my-node"


[settings.kubernetes.node-taints]
special = ["true:NoSchedule"]

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "30m"
image-patterns = [
  "*.dkr.ecr.*.amazonaws.com"
]

[settings.kubernetes.eviction-hard]
"memory.available" = "15%"

[settings.kubernetes.eviction-soft]
"memory.available" = "12%"

[settings.kubernetes.eviction-soft-grace-period]
"memory.available" = "30s"

[settings.kubernetes.system-reserved]
cpu = "10m"
ephemeral-storage = "1Gi"
memory = "100Mi"

[settings.metrics]
# whether or not health metrics will be sent. set to false to opt-out
send-metrics = false

# Use local aws time server
[settings.ntp]
time-servers = ["169.254.169.123"]

# The admin host container provides SSH access and runs with "superpowers".
# It is disabled by default, but can be disabled explicitly.
[settings.host-containers.admin]
enabled = false

# The control host container provides out-of-band access via SSM.
# It is enabled by default, and can be disabled if you do not
# expect to use SSM. This could leave you with no way to access
# the API and change settings on an existing node!
[settings.host-containers.control]
enabled = true

I was able to upgrade from v1.20.0 to v1.21.0. Using variant bottlerocket-aws-k8s-1.30-x86_64-v1.20.0.

[ssm-user@control]$ apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "4d43022e",
    "pretty_name": "Bottlerocket OS 1.21.0 (aws-k8s-1.30)",
    "variant_id": "aws-k8s-1.30",
    "version_id": "1.21.0"
  }
}

ytsssun · 2024-08-09T16:46:21Z

I was able to reproduce this issue mentioned in - #4135 (comment)

My userdata

[settings.network]
no-proxy = ["localhost", "127.0.0.1"]

[settings.kernel.sysctl]
"user.max_user_namespaces" = "0"
"vm.max_map_count" = "262144"
"net.ipv4.conf.all.send_redirects" = "0" #cis hardening 3.1.1
"net.ipv4.conf.default.send_redirects" = "0" #cis hardening 3.1.1
"net.ipv4.conf.all.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv4.conf.default.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv6.conf.all.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv6.conf.default.accept_redirects" = "0" #cis hardening 3.2.2
"net.ipv4.conf.all.secure_redirects" = "0" #cis hardening 3.2.3
"net.ipv4.conf.default.secure_redirects" = "0" #cis hardening 3.2.3
"net.ipv4.conf.all.log_martians" = "1" #cis hardening 3.2.4
"net.ipv4.conf.default.log_martians" = "1" #cis hardening 3.2.4

[settings.kubernetes.node-labels]
"bottlerocket.aws/updater-interface-version" = "2.0.0" # Configure the node-labels Bottlerocket setting to enable BruPop updates

[settings.updates]
ignore-waves = true

The failure

[    3.741549] pluto[1484]: Unable to retrieve cluster name and AWS region from Bottlerocket API: Deserialization of configuration file failed: invalid type: sequence, expected a string at line 15 column 18
[FAILED] Failed to start Generate additional settings for Kubernetes.

bcressey · 2024-08-09T16:47:50Z

[ 7.882539] pluto[1498]: Unable to retrieve cluster name and AWS region from Bottlerocket API: Deserialization of configuration file failed: invalid type: sequence, expected a string at line 16 column 18

This is happening because pluto only expects a String for no-proxy, when it should take a list.

patkinson01 · 2024-08-09T17:07:11Z

[ 7.882539] pluto[1498]: Unable to retrieve cluster name and AWS region from Bottlerocket API: Deserialization of configuration file failed: invalid type: sequence, expected a string at line 16 column 18
This is happening because pluto only expects a String for no-proxy, when it should take a list.

Hi @bcressey , we’ve see the error during a BRUPOP initiated update and haven’t made any changes to our userdata or no_proxy value which is a string. Presumably this is something which has changed in this latest AMI then?

bcressey · 2024-08-09T17:59:22Z

Hi @bcressey , we’ve see the error during a BRUPOP initiated update and haven’t made any changes to our userdata or no_proxy value which is a string. Presumably this is something which has changed in this latest AMI then?

The bug is in the newer version of pluto in 1.21.0. If you have settings.network.no-proxy defined in your settings (it's not defined by default) then it would trigger this issue on upgrade. If you don't have that setting defined then there may be another pluto bug.

bcressey · 2024-08-09T18:18:27Z

[    3.428743] early-boot-config[1329]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 500 when PATCHing /settings?tx=bottlerocket-launch: Error serializing Settings: 'unit' not allowed by Serializer

@sam-berning tracked this down to an issue with optional fields in the CredentialProvider structure. Omitting a field marked as optional will cause it to serialize to "null" which is then rejected by the datastore serializer.

bash-5.1# cat <<EOF > /local/user-data-defaults.toml
> [settings.kubernetes.credential-providers.ecr-credential-provider]
> enabled = true
> cache-duration = "30m"
> image-patterns = [
>   "*.dkr.ecr.*.amazonaws.com"
> ]
> EOF

bash-5.1# early-boot-config
[2024-08-09T17:52:21Z INFO  early_boot_config] early-boot-config started
[2024-08-09T17:52:21Z INFO  early_boot_config] Gathering user data providers
[2024-08-09T17:52:21Z INFO  early_boot_config] Provider '10-local-defaults': [2024-08-09T17:52:21Z INFO  early_boot_config_provider::provider] '/local/user-data-defaults.toml' exists, using it
[2024-08-09T17:52:21Z INFO  early_boot_config] Found user data via user data from /local/user-data-defaults.toml, sending to API
Error PATCHing '/settings?tx=bottlerocket-launch': Status 500 when PATCHing /settings?tx=bottlerocket-launch: Error serializing Settings: 'unit' not allowed by Serializer

Fully specifying the user data for the credential provider, by passing in a no-op environment variable, would avoid the issue:

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "30m"
image-patterns = [
  "*.dkr.ecr.*.amazonaws.com"
]
environment.foo = "bar"

ramseymcgrathfd · 2024-08-13T17:11:05Z

@bcressey yeah good catch, it does

reckon it'll need

    #[serde(skip_serializing_if = "Option::is_none")]

sam-berning · 2024-08-13T22:11:21Z

reckon it'll need

    #[serde(skip_serializing_if = "Option::is_none")]

Yup, that's indeed the right fix. Should be addressed as of bottlerocket-os/bottlerocket-settings-sdk#51. We've also updated the datastore serializer to handle null values correctly in bottlerocket-os/bottlerocket-core-kit#80, which should protect against this sort of bug moving forward

yeazelm · 2024-08-27T00:37:44Z

We have released 1.21.1 that should allow a good upgrade from 1.20.5. Please let us know that it solves your problem!

patkinson01 · 2024-08-27T13:58:11Z

All good, thanks for a quick turnaround!!

EthanKane-FD · 2024-08-28T09:50:29Z

Hey thanks @yeazelm , have rolled this out on a few lab clusters and everything seems to be in order. Thanks again

EthanKane-FD added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels Aug 9, 2024

gthao313 mentioned this issue Aug 9, 2024

Upgrade failures on Bottlerocket 1.21.0 #4136

Open

This was referenced Aug 9, 2024

models: skip serializing credential provider fields if they are None bottlerocket-os/bottlerocket-settings-sdk#51

Merged

Update datastore serializer to expect JSON and correctly handle null values bottlerocket-os/bottlerocket-core-kit#80

Merged

cbgbt mentioned this issue Aug 13, 2024

Pluto settings models bottlerocket-os/bottlerocket-core-kit#89

Merged

6 tasks

ytsssun mentioned this issue Aug 15, 2024

sources: update to bottlerocket-settings-models v0.3.0 #4145

Merged

2 tasks

ginglis13 mentioned this issue Aug 16, 2024

v1.21.1 🪁 Tracking Issue #4148

Closed

4 tasks

EthanKane-FD closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.21.0 Status 500 when PATCHing /settings?tx=bottlerocket-launch #4135

1.21.0 Status 500 when PATCHing /settings?tx=bottlerocket-launch #4135

EthanKane-FD commented Aug 9, 2024 •

edited

Loading

patkinson01 commented Aug 9, 2024

ramseymcgrathfd commented Aug 9, 2024 •

edited

Loading

yeazelm commented Aug 9, 2024

yeazelm commented Aug 9, 2024

EthanKane-FD commented Aug 9, 2024

patkinson01 commented Aug 9, 2024 •

edited

Loading

ytsssun commented Aug 9, 2024 •

edited

Loading

ytsssun commented Aug 9, 2024

bcressey commented Aug 9, 2024

patkinson01 commented Aug 9, 2024 •

edited

Loading

bcressey commented Aug 9, 2024

bcressey commented Aug 9, 2024

ramseymcgrathfd commented Aug 13, 2024 •

edited

Loading

sam-berning commented Aug 13, 2024

yeazelm commented Aug 27, 2024

patkinson01 commented Aug 27, 2024

EthanKane-FD commented Aug 28, 2024

1.21.0 Status 500 when PATCHing /settings?tx=bottlerocket-launch #4135

1.21.0 Status 500 when PATCHing /settings?tx=bottlerocket-launch #4135

Comments

EthanKane-FD commented Aug 9, 2024 • edited Loading

patkinson01 commented Aug 9, 2024

ramseymcgrathfd commented Aug 9, 2024 • edited Loading

yeazelm commented Aug 9, 2024

yeazelm commented Aug 9, 2024

EthanKane-FD commented Aug 9, 2024

patkinson01 commented Aug 9, 2024 • edited Loading

ytsssun commented Aug 9, 2024 • edited Loading

ytsssun commented Aug 9, 2024

bcressey commented Aug 9, 2024

patkinson01 commented Aug 9, 2024 • edited Loading

bcressey commented Aug 9, 2024

bcressey commented Aug 9, 2024

ramseymcgrathfd commented Aug 13, 2024 • edited Loading

sam-berning commented Aug 13, 2024

yeazelm commented Aug 27, 2024

patkinson01 commented Aug 27, 2024

EthanKane-FD commented Aug 28, 2024

EthanKane-FD commented Aug 9, 2024 •

edited

Loading

ramseymcgrathfd commented Aug 9, 2024 •

edited

Loading

patkinson01 commented Aug 9, 2024 •

edited

Loading

ytsssun commented Aug 9, 2024 •

edited

Loading

patkinson01 commented Aug 9, 2024 •

edited

Loading

ramseymcgrathfd commented Aug 13, 2024 •

edited

Loading