You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you initialize a shard with restore_from_backup disabled, and then later you restart one or more tablets to turn on restore_from_backup, that tablet will get stuck and fail to replicate if the shard had no tables in it.
This happens because checkNoDB reports that the tablet is "empty" (doesn't have any data on it) if there are no tables in the keyspace's main DB, even if the DB itself exists. After that, the tablet has its replication position reset (because it's presumed to be empty), which causes it to attempt to replay old transactions from the master's binlog and become unable to replicate.
This is a corner case that probably only comes up when testing, but I wanted to document it as a caveat in case anyone else hits it. If you create at least one table in the shard before trying to turn on restore_from_backup, it should be ok.
This bug is probably due to the fact that I assumed no one would ever transition a live shard between restore_from_backup=false and restore_from_backup=true because the flag doesn't really mean "restore from backup," but rather, "I am not YouTube." See #3753.
Reproduction Steps
Start up a shard with restore_from_backup off.
Run InitShardMaster.
Restart one of the replica tablets with restore_from_backup on.
Observe that upon restarting, the replica's SQL thread stops due to data that has diverged from the master.
Binary version
Example:
$ vttablet --versionVersion: eda3e15b2 (Git branch 'master') built on Fri Apr 26 22:00:23 UTC 2019 by root@d873e9ef91dc using go1.11.8 linux/amd64
Operating system and Environment details
Log Fragments
vttablet log on a tablet that already had been InitShardMaster'ed, but had no tables yet, and then was restarted to turn on restore_from_backup:
I0531 19:47:25.770005 1 backup.go:231] Restore: checking no existing data is present
I0531 19:47:25.773387 1 backup.go:246] Restore: looking for a suitable backup to restore
E0531 19:47:25.953529 1 backup.go:260] no backup to restore on BackupStorage for directory main/-80. Starting up empty.
Instead of starting up empty, it should have said:
Auto-restore is enabled, but mysqld already contains data. Assuming vttablet was just restarted.
The text was updated successfully, but these errors were encountered:
Overview of the Issue
If you initialize a shard with
restore_from_backup
disabled, and then later you restart one or more tablets to turn onrestore_from_backup
, that tablet will get stuck and fail to replicate if the shard had no tables in it.This happens because
checkNoDB
reports that the tablet is "empty" (doesn't have any data on it) if there are no tables in the keyspace's main DB, even if the DB itself exists. After that, the tablet has its replication position reset (because it's presumed to be empty), which causes it to attempt to replay old transactions from the master's binlog and become unable to replicate.This is a corner case that probably only comes up when testing, but I wanted to document it as a caveat in case anyone else hits it. If you create at least one table in the shard before trying to turn on
restore_from_backup
, it should be ok.This bug is probably due to the fact that I assumed no one would ever transition a live shard between
restore_from_backup=false
andrestore_from_backup=true
because the flag doesn't really mean "restore from backup," but rather, "I am not YouTube." See #3753.Reproduction Steps
restore_from_backup
off.restore_from_backup
on.Binary version
Example:
Operating system and Environment details
Log Fragments
vttablet log on a tablet that already had been InitShardMaster'ed, but had no tables yet, and then was restarted to turn on
restore_from_backup
:Instead of starting up empty, it should have said:
The text was updated successfully, but these errors were encountered: