Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move etcd snapshot management CLI to request/response #9816

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

brandond
Copy link
Contributor

@brandond brandond commented Mar 28, 2024

Proposed Changes

Moves etcd-snapshot commands over to a request/response process, similar to how secrets-encrypt and certificate rotate-ca work. Benefits of this include:

  • No longer having to recreate the whole server control config in the CLI just to take a snapshot.
    We were having issues with config passed as server args not being visible to the etcd-snapshot cli, for example cluster/service cidr, node name, and so on.
  • Ensures that the process for on-demand snapshots is consistent with that of scheduled snapshots.
  • Ensures that multiple snapshots are not taken concurrently.

It does slightly reduce the feedback available via the CLI when taking a snapshot, as the actual operation is now completed server-side, the client is just sent back a list of snapshots or an error.

  • etcd-snapshot save/prune/delete now only output either a short error message, or the name of the created/deleted snapshots
  • The json and yaml format output of etcd-snapshot list has changed
  • The text format output of etcd-snapshot list is identical

Rancher CAPR only uses the text format output of etcd-snapshot save so this should be fine:

Example CLI output:

root@k3s-server-1:~# k3s etcd-snapshot save
INFO[0000] Snapshot on-demand-k3s-server-1-1712336224 saved.

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-s3
FATA[0000] see server log for details: s3 bucket name was not set

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-s3 --etcd-s3-bucket foo
FATA[0001] see server log for details: failed to test for existence of bucket foo: 401 Unauthorized

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-snapshot-dir /foo
FATA[0000] see server log for details: Internal error occurred: etcd-snapshot error ID 18952

root@k3s-server-1:~# k3s etcd-snapshot list
Name                                  Location                                                                              Size   Created
etcd-snapshot-k3s-server-1-1712336221 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-k3s-server-1-1712336221 634912 2024-04-05T16:57:01Z
on-demand-k3s-server-1-1712336224     file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1712336224     643104 2024-04-05T16:57:04Z

Example server logs when a generic error with ID is sent:

ERRO[0193] etcd-snapshot error ID 18952: failed to get etcd-snapshot-dir: stat /foo: no such file or directory
ERRO[0193] Sending HTTP 500 response to 127.0.0.1:36124: etcd-snapshot error ID 18952

Types of Changes

bugfix/enhancement

Verification

See linked issue

Testing

Linked Issues

User-Facing Change

The `k3s etcd-snapshot` command has been reworked for improved consistency. All snapshots operations are now performed by the server process, with the CLI acting as a client to initiate and report results. As a side effect, the CLI is now less noisy when managing snapshots.

Further Comments

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 24.47917% with 290 lines in your changes are missing coverage. Please review.

Project coverage is 42.94%. Comparing base (7f65975) to head (3056aa2).
Report is 4 commits behind head on master.

Files Patch % Lines
pkg/etcd/snapshot_handler.go 0.78% 125 Missing and 1 partial ⚠️
pkg/etcd/snapshot.go 0.00% 79 Missing ⚠️
pkg/cli/etcdsnapshot/etcd_snapshot.go 61.32% 24 Missing and 17 partials ⚠️
pkg/clientaccess/token.go 53.84% 9 Missing and 9 partials ⚠️
pkg/etcd/s3.go 0.00% 10 Missing ⚠️
pkg/server/secrets-encrypt.go 0.00% 7 Missing ⚠️
pkg/util/apierrors.go 0.00% 4 Missing ⚠️
pkg/etcd/etcd.go 40.00% 3 Missing ⚠️
pkg/server/cert.go 0.00% 1 Missing ⚠️
pkg/server/token.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9816      +/-   ##
==========================================
- Coverage   52.67%   42.94%   -9.73%     
==========================================
  Files         157      158       +1     
  Lines       13822    13991     +169     
==========================================
- Hits         7281     6009    -1272     
- Misses       5155     6842    +1687     
+ Partials     1386     1140     -246     
Flag Coverage Δ
e2etests 35.99% <18.75%> (-13.27%) ⬇️
inttests 19.72% <20.05%> (-2.41%) ⬇️
unittests 16.57% <4.52%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond force-pushed the etcd-snapshot-save-refactor branch 6 times, most recently from 8b9ad2b to 763b691 Compare April 4, 2024 08:03
@brandond brandond changed the title [WIP] Move etcd snapshot management CLI to request/response Move etcd snapshot management CLI to request/response Apr 4, 2024
@brandond brandond marked this pull request as ready for review April 4, 2024 08:08
@brandond brandond requested a review from a team as a code owner April 4, 2024 08:08
@brandond brandond force-pushed the etcd-snapshot-save-refactor branch from 763b691 to 745b1d0 Compare April 4, 2024 16:43
dereknola
dereknola previously approved these changes Apr 4, 2024
Copy link
Contributor

@dereknola dereknola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, some random questions on my end.

pkg/etcd/snapshot.go Outdated Show resolved Hide resolved
}
}

func sendSnapshotResponse(rw http.ResponseWriter, req *http.Request, sr *managed.SnapshotResult) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we are sending the full error back to the client on failure, we don't try and hide it like secrets-encryption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many errors are probably just returned as-is at the moment. Let me take a look at whether or not that exposes anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine, but if you want to obfuscate you may find https://github.com/k3s-io/k3s/blob/master/pkg/server/secrets-encrypt.go#L471 helpful as it makes grepping logs alot easier for the user.

Copy link
Contributor Author

@brandond brandond Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this a bit. Most that indicate problems with the input from the user are still passed through with minimal information, while others just send a more generic message with error ID. For example:

root@k3s-server-1:~# k3s etcd-snapshot save
INFO[0000] Snapshot on-demand-k3s-server-1-1712288593 saved.

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-s3
FATA[0000] see server log for details: s3 bucket name was not set

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-s3 --etcd-s3-bucket foo
FATA[0001] see server log for details: failed to test for existence of bucket foo: 401 Unauthorized

root@k3s-server-1:~# k3s etcd-snapshot save --etcd-snapshot-dir /foo
FATA[0000] see server log for details: Internal error occurred: etcd-snapshot error ID 18952
ERRO[0193] etcd-snapshot error ID 18952: failed to get etcd-snapshot-dir: stat /foo: no such file or directory
ERRO[0193] Sending HTTP 500 response to 127.0.0.1:36124: etcd-snapshot error ID 18952

@brandond brandond force-pushed the etcd-snapshot-save-refactor branch 4 times, most recently from fde406e to e731900 Compare April 5, 2024 03:32
@brandond brandond requested a review from dereknola April 5, 2024 03:45
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@brandond brandond force-pushed the etcd-snapshot-save-refactor branch from e731900 to 3056aa2 Compare April 5, 2024 06:28
@brandond brandond requested a review from a team April 6, 2024 03:43
@brandond
Copy link
Contributor Author

brandond commented Apr 9, 2024

Confirmation that this works on an ipv6-only node:

root@systemd-node-1:/# ip addr show eth0
1880: eth0@if1881: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:08 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fd7c:53a5:aef5::242:ac11:8/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:8/64 scope link
       valid_lft forever preferred_lft forever

root@systemd-node-1:/# kubectl get node -o wide
NAME             STATUS   ROLES                       AGE    VERSION                INTERNAL-IP                  EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
systemd-node-1   Ready    control-plane,etcd,master   109m   v1.29.3+k3s-1a9615a7   fd7c:53a5:aef5::242:ac11:8   <none>        openSUSE Leap 15.4   6.6.0-1001-aws   containerd://1.7.11-k3s2

root@systemd-node-1:/# k3s etcd-snapshot save
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping
WARN[0000] Unknown flag --cluster-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --service-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --system-default-registry found in config.yaml, skipping
WARN[0000] Unknown flag --disable found in config.yaml, skipping
WARN[0000] Unknown flag --cluster-init found in config.yaml, skipping
INFO[0000] Snapshot on-demand-systemd-node-1-1712700964 saved.

@brandond brandond merged commit fe465cc into k3s-io:master Apr 9, 2024
27 checks passed
votdev added a commit to votdev/openmediavault that referenced this pull request May 11, 2024
The output of `k3s etcd-snapshot ls` has been changed with k3s-io/k3s#9816.

Signed-off-by: Volker Theile <votdev@gmx.de>
votdev added a commit to votdev/openmediavault that referenced this pull request May 13, 2024
The output of `k3s etcd-snapshot ls` has been changed with k3s-io/k3s#9816.

To avoid incompatibilities with older installations, these will be upgraded to K3S v1.29.4+k3s1. New installations will use this version as default. This version is now kept as the default as long as no upgrade to an existing version is necessary.

Users can adjust this as required and under their own responsibility via the environment variable `OMV_K8S_K3S_VERSION`.

Signed-off-by: Volker Theile <votdev@gmx.de>
votdev added a commit to votdev/openmediavault that referenced this pull request May 13, 2024
The output of `k3s etcd-snapshot ls` has been changed with k3s-io/k3s#9816.

To avoid incompatibilities with older installations, these will be upgraded to K3S v1.29.4+k3s1. New installations will use this version as default. This version is now kept as the default as long as no upgrade to an existing version is necessary.

Users can adjust this as required and under their own responsibility via the environment variable `OMV_K8S_K3S_VERSION`.

Signed-off-by: Volker Theile <votdev@gmx.de>
votdev added a commit to openmediavault/openmediavault that referenced this pull request May 13, 2024
The output of `k3s etcd-snapshot ls` has been changed with k3s-io/k3s#9816.

To avoid incompatibilities with older installations, these will be upgraded to K3S v1.29.4+k3s1. New installations will use this version as default. This version is now kept as the default as long as no upgrade to an existing version is necessary.

Users can adjust this as required and under their own responsibility via the environment variable `OMV_K8S_K3S_VERSION`.

Signed-off-by: Volker Theile <votdev@gmx.de>
@brandond brandond deleted the etcd-snapshot-save-refactor branch June 6, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants