Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't switch on restore_from_backup on existing shard with no tables #4896

Closed
enisoc opened this issue May 31, 2019 · 1 comment
Closed

Can't switch on restore_from_backup on existing shard with no tables #4896

enisoc opened this issue May 31, 2019 · 1 comment

Comments

@enisoc
Copy link
Member

enisoc commented May 31, 2019

Overview of the Issue

If you initialize a shard with restore_from_backup disabled, and then later you restart one or more tablets to turn on restore_from_backup, that tablet will get stuck and fail to replicate if the shard had no tables in it.

This happens because checkNoDB reports that the tablet is "empty" (doesn't have any data on it) if there are no tables in the keyspace's main DB, even if the DB itself exists. After that, the tablet has its replication position reset (because it's presumed to be empty), which causes it to attempt to replay old transactions from the master's binlog and become unable to replicate.

This is a corner case that probably only comes up when testing, but I wanted to document it as a caveat in case anyone else hits it. If you create at least one table in the shard before trying to turn on restore_from_backup, it should be ok.

This bug is probably due to the fact that I assumed no one would ever transition a live shard between restore_from_backup=false and restore_from_backup=true because the flag doesn't really mean "restore from backup," but rather, "I am not YouTube." See #3753.

Reproduction Steps

  1. Start up a shard with restore_from_backup off.
  2. Run InitShardMaster.
  3. Restart one of the replica tablets with restore_from_backup on.
  4. Observe that upon restarting, the replica's SQL thread stops due to data that has diverged from the master.

Binary version

Example:

$ vttablet --version
Version: eda3e15b2 (Git branch 'master') built on Fri Apr 26 22:00:23 UTC 2019 by root@d873e9ef91dc using go1.11.8 linux/amd64

Operating system and Environment details

Log Fragments

vttablet log on a tablet that already had been InitShardMaster'ed, but had no tables yet, and then was restarted to turn on restore_from_backup:

I0531 19:47:25.770005       1 backup.go:231] Restore: checking no existing data is present
I0531 19:47:25.773387       1 backup.go:246] Restore: looking for a suitable backup to restore
E0531 19:47:25.953529       1 backup.go:260] no backup to restore on BackupStorage for directory main/-80. Starting up empty.

Instead of starting up empty, it should have said:

Auto-restore is enabled, but mysqld already contains data. Assuming vttablet was just restarted.
@enisoc
Copy link
Member Author

enisoc commented Sep 15, 2020

This should be fixed by #6695.

@enisoc enisoc closed this as completed Sep 15, 2020
@askdba askdba added this to the v8.0 milestone Oct 6, 2020
systay pushed a commit that referenced this issue Jul 22, 2024
Signed-off-by: Noble Mittal <noblemittal@outlook.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Noble Mittal <62551163+beingnoble03@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants