Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue that allowed multiple simultaneous snapshots to be allowed #10372

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

brandond
Copy link
Member

@brandond brandond commented Jun 18, 2024

Proposed Changes

Replace 1-weight semaphore on snapshots with simple mutex

Fixes an issue where the semaphore wasn't permanently initialized until a scheduled snapshot was taken, allowing multiple on-demand snapshots to be taken until the first scheduled snapshot was triggered.

Types of Changes

bugfix

Verification

When fixed, you should see:

root@k3s-server-1:~# k3s etcd-snapshot save & k3s etcd-snapshot save; sleep 5
FATA[0000] see server log for details: Internal error occurred: etcd-snapshot error ID 40905
INFO[0000] Snapshot on-demand-k3s-server-1-1718755362 saved.
[1]+  Done(1)                    k3s etcd-snapshot save

server log:

INFO[0055] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362
{"level":"info","ts":"2024-06-19T00:02:41.544644Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362.part"}
{"level":"info","ts":"2024-06-19T00:02:41.546515Z","logger":"etcd-client.client","caller":"v3@v3.5.13-k3s1/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-06-19T00:02:41.546559Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-06-19T00:02:41.546709Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":2859008,"size":"2.9 MB"}
ERRO[0055] etcd-snapshot error ID 40905: snapshot save already in progress
ERRO[0055] Sending HTTP 500 response to 127.0.0.1:35044: etcd-snapshot error ID 40905
{"level":"info","ts":"2024-06-19T00:02:41.5617Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":2859008,"checksum-size":32}
{"level":"info","ts":"2024-06-19T00:02:41.561741Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":2859008,"size":"2.9 MB","took":"now"}
{"level":"info","ts":"2024-06-19T00:02:41.561852Z","logger":"etcd-client.client","caller":"v3@v3.5.13-k3s1/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-06-19T00:02:41.571993Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"2.9 MB","took":"now"}
{"level":"info","ts":"2024-06-19T00:02:41.572068Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362"}
INFO[0055] Reconciling ETCDSnapshotFile resources
I0619 00:02:41.576558      54 event.go:389] "Event occurred" object="local-on-demand-k3s-server-1-1718755362-e95d32" fieldPath="" kind="ETCDSnapshotFile" apiVersion="k3s.cattle.io/v1" type="Normal" reason="ETCDSnapshotCreated" message="Snapshot on-demand-k3s-server-1-1718755362 saved on k3s-server-1"
INFO[0055] Reconciliation of ETCDSnapshotFile resources complete

Testing

Linked Issues

User-Facing Change

Further Comments

@brandond brandond requested a review from a team as a code owner June 18, 2024 23:51
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 20.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 41.62%. Comparing base (b4d4ed8) to head (0a64ef9).

Files Patch % Lines
pkg/etcd/snapshot_handler.go 0.00% 5 Missing ⚠️
pkg/etcd/snapshot.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10372      +/-   ##
==========================================
- Coverage   47.97%   41.62%   -6.36%     
==========================================
  Files         177      177              
  Lines       14807    14802       -5     
==========================================
- Hits         7104     6161     -943     
- Misses       6363     7463    +1100     
+ Partials     1340     1178     -162     
Flag Coverage Δ
e2etests 36.48% <20.00%> (-10.06%) ⬇️
inttests 19.96% <20.00%> (-17.10%) ⬇️
unittests 11.36% <20.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond changed the title Fix error that allowed multiple simultaneous snapshots to be allowed Fix issue that allowed multiple simultaneous snapshots to be allowed Jun 19, 2024
@brandond brandond merged commit aa4794b into k3s-io:master Jun 19, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants