Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Snapshotter fail to apply watch when revision is already compacted. #599

Closed
ishan16696 opened this issue Mar 7, 2023 · 1 comment · Fixed by #600
Closed

[BUG] Snapshotter fail to apply watch when revision is already compacted. #599

ishan16696 opened this issue Mar 7, 2023 · 1 comment · Fixed by #600
Assignees
Labels
kind/bug Bug priority/1 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)

Comments

@ishan16696
Copy link
Member

Describe the bug:

When snapshotter get close due to some error, so it tries to restart the snapshotter by apply the watch on etcd. But due to etcd's auto-compaction etcd might already compacted the revision number say X and when backup-restore tries to apply the watch on revision which is <=X (means revision number is already compacted) this will leads to error in watch connection hence watch channel get close, and backup-restore never able to restart the snapshotter.

Expected behavior:
If snapshotter fail to apply watch when revision is already compacted then it should take a full snapshot to come out of this situation.

How To Reproduce (as minimally and precisely as possible):

  1. Start the etcd and backup-restore.
  2. Let the backup-restore take one full-snapshot and apply watch from revision 2.
  3. Close the backup-restore
  4. Put some dummy data in etcd
  5. Run compaction on etcd using etcdctl compact <Revision no>
  6. Start the backup-restore:
INFO[0019] Applied watch on etcd from revision: 2        actor=snapshotter
WARN[0019] Failed to collect events for first delta snapshot(s): etcdserver: mvcc: required revision has been compacted  actor=backup-restore-server
INFO[0019] Starting the garbage collector...             actor=backup-restore-server
INFO[0019] Starting snapshotter...                       actor=backup-restore-server
INFO[0019] Will take next full snapshot at time: 2023-03-07 20:16:00 +0530 IST  actor=snapshotter
INFO[0019] Starting the Snapshot EventHandler.           actor=snapshotter
INFO[0019] Closing the Snapshotter...                    actor=snapshotter
ERRO[0019] Snapshotter failed with error: watch channel closed  actor=backup-restore-server
INFO[0019] Snapshotter stopped.                          actor=backup-restore-server

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-backup-restore version/commit ID: v0.22.0
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]: All

Anything else we need to know?:

@ishan16696 ishan16696 added the kind/bug Bug label Mar 7, 2023
@ishan16696
Copy link
Member Author

/assign

@ishan16696 ishan16696 added the priority/1 Priority (lower number equals higher priority) label Mar 7, 2023
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug priority/1 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
2 participants