-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PITR: Mechanism to automatically take snapshots at predefined interval #7126
Labels
area/docdb
YugabyteDB core features
Comments
This was referenced Feb 5, 2021
Closed
spolitov
added a commit
that referenced
this issue
Mar 2, 2021
Summary: This diff adds ability to create snapshot schedules on master. Those schedules does nothing currently, but will be used as a main configuration in PITR. With these, a user should be able to specify - which tables should be snapshot together (ie: individual tables, whole keyspaces, whole cluster) - how frequently should the snapshots be taken - how long should the snapshots be kept for Test Plan: ybd --cxx-test snapshot-schedule-test Reviewers: oleg, bogdan Reviewed By: bogdan Subscribers: amitanand, nicolas, zyu, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10726
spolitov
added a commit
that referenced
this issue
Mar 8, 2021
Summary: This diff is a follow up to D10726 / 0fde8db. With this diff, the interval setting of schedules will be respected. We will now take snapshots: - when a schedule is first created - repeatedly, after the set amount of time, specified by each schedule's interval Test Plan: ybd --gtest_filter SnapshotScheduleTest.Snapshot Reviewers: nicolas, amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10781
polarweasel
pushed a commit
to lizayugabyte/yugabyte-db
that referenced
this issue
Mar 9, 2021
Summary: This diff adds ability to create snapshot schedules on master. Those schedules does nothing currently, but will be used as a main configuration in PITR. With these, a user should be able to specify - which tables should be snapshot together (ie: individual tables, whole keyspaces, whole cluster) - how frequently should the snapshots be taken - how long should the snapshots be kept for Test Plan: ybd --cxx-test snapshot-schedule-test Reviewers: oleg, bogdan Reviewed By: bogdan Subscribers: amitanand, nicolas, zyu, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10726
spolitov
added a commit
that referenced
this issue
Mar 12, 2021
Summary: This diff adds logic to cleanup schedules snapshots that are out of our retention bounds. We leverage the existing cleanup mechanism for snapshots, by deleting the oldest snapshot from every schedule, in each polling interval. Test Plan: ybd --gtest_filter SnapshotScheduleTest.GC Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10864
spolitov
added a commit
that referenced
this issue
Mar 16, 2021
Summary: This diff adds snapshot schedule loading during tablet bootstrap. So schedules are loaded after master restart. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Restart Reviewers: bogdan, amitanand, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10916
spolitov
added a commit
that referenced
this issue
Mar 17, 2021
Summary: This diff adds new history retention mechanism for tablets that participate in point in time restore. Normally, history retention is controlled by this gflag `timestamp_history_retention_interval_sec`, potentially cleaning up any values older than this interval. After this diff, history for such tablets will only be cleaned up to the point before the latest PITR snapshot. To achieve this, we need to know two things on the TS side 1) Which schedule is a tablet a part of. The master knows this information, based on the filters set on PITR schedules. It can send this schedule ID together with any schedule related snapshots, to any involved TS. We can then persist this information in the tablet metadata. Ideally, the first PITR related snapshot will do this, but in case of any errors, snapshots are retried automatically by the master, so we have a guarantee that the TS will eventually update this persistent metadata. 2) What are the history retention requirements for each relevant schedule. We use the TS heartbeat response to flow this information from the master to the TS. Then, on any compaction, we will choose the minimum between the existent flag vs any of the retention policies of any of the schedules a tablet is involved in. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Snapshot Reviewers: amitanand, mbautin, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10861
spolitov
added a commit
that referenced
this issue
Mar 19, 2021
Summary: This diff adds logic to propagate correct history retention to newly created tablets and particiate in existing snapshot schedule. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Index Reviewers: bogdan, amitanand Reviewed By: amitanand Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10968
spolitov
added a commit
that referenced
this issue
Mar 19, 2021
Summary: This diff adds logic to take system catalog snapshot while taking scheduled snapshot. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10957
spolitov
added a commit
that referenced
this issue
Apr 1, 2021
Summary: This diff adds logic to restore table schema. After this, we should be able to undo an ALTER TABLE operation! There are two important changes as part of this diff. 1) Restoring master side sys_catalog metadata. 2) Sending the restored version of the schema from the master to the TS, as part of the explicit command to restore the TS. As part of applying the restore operation on the master, we add new state tracking, which can do the diff between current sys_catalog state vs the state at the time at which we want to restore. This is done by restoring the corresponding sys_catalog snapshot into a temporary directory, with the HybridTime filter applied, for the restore_at time. We then load the relevant TABLE and TABLET data into memory and overwrite the existing rocksdb data directly in memory. This is safe to do because - It is done as part of the apply step of a raft operation, so it is already persisted and will be replayed accordingly at bootstrap, in case of a restart. - It is done on both leader and follower. Once the master state is rolled back, we then run the TS side of the restore operation. The master now sends over the restored schema information, as part of the Restore request. On the TS side, we update our tablet schema information on disk accordingly. Note: In between the master state being rolled back and all the TS processing their respective restores, there is a time window in which the master can receive heartbeats from a TS, with newer schema information than what the master has persisted. Currently, that seems to only lead to some log spew, but will be investigated later, as part of fault tolerance testing. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11013
spolitov
added a commit
that referenced
this issue
Apr 1, 2021
Summary: This diff adds 2 commands to yb-admin: 1) create_snapshot_schedule <snapshot_interval_in_minutes> <snapshot_retention_in_minutes> <table> [<table>]... Where: snapshot_interval_in_minutes - snapshot interval specified in minutes. snapshot_retention_in_minutes - snapshot retention specified in minutes. Followed by the list of tables list in other yb-admin commands. 2) list_snapshot_schedules [<schedule_id>] Where: schedule_id - optional argument to specify schedule id, that should be listed. When not specified all schedules are listed. Test Plan: ybd --cxx-test yb-admin-test --gtest_filter AdminCliTest.SnapshotSchedule Reviewers: bogdan, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11077
spolitov
added a commit
that referenced
this issue
Apr 12, 2021
Summary: This diff adds admin command restore_snapshot_schedule to restore snapshot schedule at specified time. Also added command list_snapshot_restorations to list snapshot restorations. Test Plan: ybd --cxx-test yb-admin-test --gtest_filter AdminCliTest.SnapshotSchedule Reviewers: bogdan, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11177
spolitov
added a commit
that referenced
this issue
Apr 19, 2021
Summary: It could happen that tablets or tables were created after the restoration point and current cluster state. In this case, we should remove them during restoration. Since currently, we cannot create a filter for not-created tables, new test checks only that created index and its tablets are deleted after restore. Test for revering creation of regular tablets should be added in upcoming diffs when new filters will be supported. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RemmoveNewTablets Reviewers: bogdan Reviewed By: bogdan Subscribers: jenkins-bot, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11240
spolitov
added a commit
that referenced
this issue
May 5, 2021
Summary: Adds the ability to restore the table that was previously deleted. When the tablet that participates in the snapshot schedule is being deleted, it is marked as hidden instead of performing actual delete. Such tablets reject reads and write, but could be restored to some point in time. Cleanup for such tables should be implemented in follow-up diffs. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.SnapshotScheduleUndeleteTable Reviewers: bogdan Reviewed By: bogdan Subscribers: rahuldesirazu, skedia, mbautin, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11389
spolitov
added a commit
that referenced
this issue
May 11, 2021
Summary: If table is participated in snapshot schedule, it's tablets are not being deleted immediately when the table is deleted. Because those tablets could be restored by user request, we instead just mark them as hidden. This diff adds logic to cleanup such tablets when there is no schedule that could be used to restore those tablets. This means both - this tablet is still covered by some schedule's filter, but it is no longer in the retention interval for any of them - this tablet is not covered by any schedule's filter anymore, as they've all been deleted Also fixed a bug with `SnapshotState::TryStartDelete` when the snapshot did not have any tablets. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.CleanupDeletedTablets Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11489
spolitov
added a commit
that referenced
this issue
May 14, 2021
Summary: Adds the ability to restore the table that was previously deleted. When the tablet that participates in the snapshot schedule is being deleted, it is marked as hidden instead of performing actual delete. Such tablets reject reads and write, but could be restored to some point in time. Cleanup for such tables should be implemented in follow-up diffs. Original commit: D11389 / 9fd73c7 Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.SnapshotScheduleUndeleteTable Jenkins: rebase: 2.6 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase, mbautin, skedia, rahuldesirazu Differential Revision: https://phabricator.dev.yugabyte.com/D11594
spolitov
added a commit
that referenced
this issue
May 14, 2021
Summary: If table is participated in snapshot schedule, it's tablets are not being deleted immediately when the table is deleted. Because those tablets could be restored by user request, we instead just mark them as hidden. This diff adds logic to cleanup such tablets when there is no schedule that could be used to restore those tablets. This means both - this tablet is still covered by some schedule's filter, but it is no longer in the retention interval for any of them - this tablet is not covered by any schedule's filter anymore, as they've all been deleted Also fixed a bug with `SnapshotState::TryStartDelete` when the snapshot did not have any tablets. Original commit: D11489/4e9665ad7ee022ef0d118940a1086aac5ffd1110 Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.CleanupDeletedTablets Jenkins: rebase: 2.6 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11598
spolitov
added a commit
that referenced
this issue
May 14, 2021
Summary: This diff adds handling of hidden tablets during master failover. We introduce a new persistent hide_state on the Table objects in the master. - When deleting a table covered by PITR, we leave it in RUNNING state, but change the hide_state to HIDING. - Once all tablets are also hidden, we transition the table's hide_state to HIDDEN - Once the table goes out of PITR scope, we then change it from RUNNING to DELETED This also buttons up all callsites that use GetTables, to ensure we don't display hidden tables to clients that do not care about them. This is relevant for YCQL system tables, for example. In the master UIs, we can keep displaying hidden tables as well. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteTableWithRestart Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11563
spolitov
added a commit
that referenced
this issue
May 22, 2021
Summary: Fixes issues uncovered by YbAdminSnapshotScheduleTest.UndeleteIndex test. 1) DeleteTableInMemory could be called multiple times in the case of the index table. There is a check that just does noop when the table was already deleted. Adjusted this check to do the same when the table is being hidden. 2) Don't remove the table from names map during delete, when it was previously hidden. Otherwise, it would crash with fatal during cleanup. 3) DeleteTabletListAndSendRequests executes delete on tablet before commiting tablet info changes. As a result tablet could be deleted before and callback called, before info changes in memory. So table would hang in delete state. Because callback would think that tablet is not being deleted. 4) Decreased log flooding when compactions are being enabled in RocksDB. When compactions are being enabled we call SetOptions twice for each RocksDB, and each of them dumps all current options values. So while we have regular and intents DB we have 4 dumps of all rocksdb options. Also added debug logging to `RWCLock::WriteLock()`, when it takes a too long time to acquire this lock, it would log the stack trace of the successful write lock. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteIndex -n 20 Reviewers: bogdan Reviewed By: bogdan Subscribers: amitanand, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11614
spolitov
added a commit
that referenced
this issue
May 25, 2021
Summary: This diff adds handling of hidden tablets during master failover. We introduce a new persistent hide_state on the Table objects in the master. - When deleting a table covered by PITR, we leave it in RUNNING state, but change the hide_state to HIDING. - Once all tablets are also hidden, we transition the table's hide_state to HIDDEN - Once the table goes out of PITR scope, we then change it from RUNNING to DELETED This also buttons up all callsites that use GetTables, to ensure we don't display hidden tables to clients that do not care about them. This is relevant for YCQL system tables, for example. In the master UIs, we can keep displaying hidden tables as well. Original commit: D11563 / c221319 Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteTableWithRestart Jenkins: rebase: 2.6 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11697
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds logic to cleanup schedules snapshots that are out of our retention bounds. We leverage the existing cleanup mechanism for snapshots, by deleting the oldest snapshot from every schedule, in each polling interval. Test Plan: ybd --gtest_filter SnapshotScheduleTest.GC Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10864
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds snapshot schedule loading during tablet bootstrap. So schedules are loaded after master restart. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Restart Reviewers: bogdan, amitanand, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10916
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds new history retention mechanism for tablets that participate in point in time restore. Normally, history retention is controlled by this gflag `timestamp_history_retention_interval_sec`, potentially cleaning up any values older than this interval. After this diff, history for such tablets will only be cleaned up to the point before the latest PITR snapshot. To achieve this, we need to know two things on the TS side 1) Which schedule is a tablet a part of. The master knows this information, based on the filters set on PITR schedules. It can send this schedule ID together with any schedule related snapshots, to any involved TS. We can then persist this information in the tablet metadata. Ideally, the first PITR related snapshot will do this, but in case of any errors, snapshots are retried automatically by the master, so we have a guarantee that the TS will eventually update this persistent metadata. 2) What are the history retention requirements for each relevant schedule. We use the TS heartbeat response to flow this information from the master to the TS. Then, on any compaction, we will choose the minimum between the existent flag vs any of the retention policies of any of the schedules a tablet is involved in. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Snapshot Reviewers: amitanand, mbautin, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10861
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds logic to propagate correct history retention to newly created tablets and particiate in existing snapshot schedule. Test Plan: ybd --gtest_filter SnapshotScheduleTest.Index Reviewers: bogdan, amitanand Reviewed By: amitanand Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10968
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds logic to take system catalog snapshot while taking scheduled snapshot. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D10957
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds logic to restore table schema. After this, we should be able to undo an ALTER TABLE operation! There are two important changes as part of this diff. 1) Restoring master side sys_catalog metadata. 2) Sending the restored version of the schema from the master to the TS, as part of the explicit command to restore the TS. As part of applying the restore operation on the master, we add new state tracking, which can do the diff between current sys_catalog state vs the state at the time at which we want to restore. This is done by restoring the corresponding sys_catalog snapshot into a temporary directory, with the HybridTime filter applied, for the restore_at time. We then load the relevant TABLE and TABLET data into memory and overwrite the existing rocksdb data directly in memory. This is safe to do because - It is done as part of the apply step of a raft operation, so it is already persisted and will be replayed accordingly at bootstrap, in case of a restart. - It is done on both leader and follower. Once the master state is rolled back, we then run the TS side of the restore operation. The master now sends over the restored schema information, as part of the Restore request. On the TS side, we update our tablet schema information on disk accordingly. Note: In between the master state being rolled back and all the TS processing their respective restores, there is a time window in which the master can receive heartbeats from a TS, with newer schema information than what the master has persisted. Currently, that seems to only lead to some log spew, but will be investigated later, as part of fault tolerance testing. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema Reviewers: amitanand, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11013
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
…edules Summary: This diff adds 2 commands to yb-admin: 1) create_snapshot_schedule <snapshot_interval_in_minutes> <snapshot_retention_in_minutes> <table> [<table>]... Where: snapshot_interval_in_minutes - snapshot interval specified in minutes. snapshot_retention_in_minutes - snapshot retention specified in minutes. Followed by the list of tables list in other yb-admin commands. 2) list_snapshot_schedules [<schedule_id>] Where: schedule_id - optional argument to specify schedule id, that should be listed. When not specified all schedules are listed. Test Plan: ybd --cxx-test yb-admin-test --gtest_filter AdminCliTest.SnapshotSchedule Reviewers: bogdan, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11077
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds admin command restore_snapshot_schedule to restore snapshot schedule at specified time. Also added command list_snapshot_restorations to list snapshot restorations. Test Plan: ybd --cxx-test yb-admin-test --gtest_filter AdminCliTest.SnapshotSchedule Reviewers: bogdan, oleg Reviewed By: oleg Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11177
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: It could happen that tablets or tables were created after the restoration point and current cluster state. In this case, we should remove them during restoration. Since currently, we cannot create a filter for not-created tables, new test checks only that created index and its tablets are deleted after restore. Test for revering creation of regular tablets should be added in upcoming diffs when new filters will be supported. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RemmoveNewTablets Reviewers: bogdan Reviewed By: bogdan Subscribers: jenkins-bot, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11240
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: Adds the ability to restore the table that was previously deleted. When the tablet that participates in the snapshot schedule is being deleted, it is marked as hidden instead of performing actual delete. Such tablets reject reads and write, but could be restored to some point in time. Cleanup for such tables should be implemented in follow-up diffs. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.SnapshotScheduleUndeleteTable Reviewers: bogdan Reviewed By: bogdan Subscribers: rahuldesirazu, skedia, mbautin, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11389
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: If table is participated in snapshot schedule, it's tablets are not being deleted immediately when the table is deleted. Because those tablets could be restored by user request, we instead just mark them as hidden. This diff adds logic to cleanup such tablets when there is no schedule that could be used to restore those tablets. This means both - this tablet is still covered by some schedule's filter, but it is no longer in the retention interval for any of them - this tablet is not covered by any schedule's filter anymore, as they've all been deleted Also fixed a bug with `SnapshotState::TryStartDelete` when the snapshot did not have any tablets. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.CleanupDeletedTablets Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11489
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: This diff adds handling of hidden tablets during master failover. We introduce a new persistent hide_state on the Table objects in the master. - When deleting a table covered by PITR, we leave it in RUNNING state, but change the hide_state to HIDING. - Once all tablets are also hidden, we transition the table's hide_state to HIDDEN - Once the table goes out of PITR scope, we then change it from RUNNING to DELETED This also buttons up all callsites that use GetTables, to ensure we don't display hidden tables to clients that do not care about them. This is relevant for YCQL system tables, for example. In the master UIs, we can keep displaying hidden tables as well. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteTableWithRestart Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11563
YintongMa
pushed a commit
to YintongMa/yugabyte-db
that referenced
this issue
May 26, 2021
Summary: Fixes issues uncovered by YbAdminSnapshotScheduleTest.UndeleteIndex test. 1) DeleteTableInMemory could be called multiple times in the case of the index table. There is a check that just does noop when the table was already deleted. Adjusted this check to do the same when the table is being hidden. 2) Don't remove the table from names map during delete, when it was previously hidden. Otherwise, it would crash with fatal during cleanup. 3) DeleteTabletListAndSendRequests executes delete on tablet before commiting tablet info changes. As a result tablet could be deleted before and callback called, before info changes in memory. So table would hang in delete state. Because callback would think that tablet is not being deleted. 4) Decreased log flooding when compactions are being enabled in RocksDB. When compactions are being enabled we call SetOptions twice for each RocksDB, and each of them dumps all current options values. So while we have regular and intents DB we have 4 dumps of all rocksdb options. Also added debug logging to `RWCLock::WriteLock()`, when it takes a too long time to acquire this lock, it would log the stack trace of the successful write lock. Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteIndex -n 20 Reviewers: bogdan Reviewed By: bogdan Subscribers: amitanand, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11614
spolitov
added a commit
that referenced
this issue
May 29, 2021
Summary: Sometime delete tablet could take some time, for instance because of remote bootstrap. It could cause the SnapshotScheduleTest.RemoveNewTablets test to fail, because it expects that tablet to be deleted. This diff fixes SnapshotScheduleTest.RemoveNewTablets by adding WaitFor during check that all necessary tablets were deleted. Test Plan: ybd --gtest_filter SnapshotScheduleTest.RemoveNewTablets -n 200 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11756
spolitov
added a commit
that referenced
this issue
Jun 15, 2021
Summary: Fixes issues uncovered by YbAdminSnapshotScheduleTest.UndeleteIndex test. 1) DeleteTableInMemory could be called multiple times in the case of the index table. There is a check that just does noop when the table was already deleted. Adjusted this check to do the same when the table is being hidden. 2) Don't remove the table from names map during delete, when it was previously hidden. Otherwise, it would crash with fatal during cleanup. 3) DeleteTabletListAndSendRequests executes delete on tablet before commiting tablet info changes. As a result tablet could be deleted before and callback called, before info changes in memory. So table would hang in delete state. Because callback would think that tablet is not being deleted. 4) Decreased log flooding when compactions are being enabled in RocksDB. When compactions are being enabled we call SetOptions twice for each RocksDB, and each of them dumps all current options values. So while we have regular and intents DB we have 4 dumps of all rocksdb options. Also added debug logging to `RWCLock::WriteLock()`, when it takes a too long time to acquire this lock, it would log the stack trace of the successful write lock. Original diff: D11614/5fc9ce1e301015b563af652868d18b1f5cbf4395 Test Plan: ybd --gtest_filter YbAdminSnapshotScheduleTest.UndeleteIndex -n 20 Jenkins: rebase: 2.6 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase, amitanand Differential Revision: https://phabricator.dev.yugabyte.com/D11904
spolitov
added a commit
that referenced
this issue
Jun 17, 2021
Summary: Sometime delete tablet could take some time, for instance because of remote bootstrap. It could cause the SnapshotScheduleTest.RemoveNewTablets test to fail, because it expects that tablet to be deleted. This diff fixes SnapshotScheduleTest.RemoveNewTablets by adding WaitFor during check that all necessary tablets were deleted. Original commit: D11756 / 7537427 Test Plan: ybd --gtest_filter SnapshotScheduleTest.RemoveNewTablets -n 200 Jenkins: rebase: 2.6 Reviewers: bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D11951
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After #7125, we could improve on this, by automatically taking snapshots, to make sure we don't end up increasing history retention to very large values, by relying on the users creating snapshots themselves.
The text was updated successfully, but these errors were encountered: