-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: migrations for annotations & notebooks resource types on operator token #21840
Conversation
bolt/kv.go
Outdated
// swapAndOpenRestored is used while restoring a database to close the existing | ||
// database, replace the existing database with the restored database at the | ||
// temp path, and then re-open the restored database. | ||
func (s *KVStore) swapAndOpenRestored() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was re-factored from existing code into a separate function to make it cleaner to manage the mutex locking and unlocking.
sqlite/sqlite.go
Outdated
@@ -230,14 +231,38 @@ func (s *SqlStore) RestoreSqlStore(ctx context.Context, r io.Reader) error { | |||
return err | |||
} | |||
|
|||
// Close the current DB. | |||
// Close the current DB and run the restore while under lock. | |||
s.Mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have locks here before but it seems like we really should be getting a lock prior to doing the sqlite database restoration as well so I adapted similar logic as in the bolt kv restore code for locking and unlocking.
sqlite/sqlite.go
Outdated
// swapAndOpenRestored is used while restoring a database to close the existing | ||
// database, restore the backed up database, and then re-open the restored | ||
// database. | ||
func (s *SqlStore) runSqlRestore(ctx context.Context, tempFileName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above in the bolt kv restore file, this function was adapted from existing code to make it easier to manage the lock.
bolt/kv.go
Outdated
// Atomically swap temporary file with current DB file. | ||
if err := fs.RenameFileWithReplacement(s.tempPath(), s.path); err != nil { | ||
// Now that the DB has been re-opened, release the lock so that the | ||
// migrations can be run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a potential opening for race conditions? Unless there's some higher-level lock protecting this whole thing.
If the migrator is currently coded to take a lock, would it be possible/better to refactor it to remove that step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely some potential for a race condition here & not a higher level lock. My initial thought was that it wouldn't be much worse than the potential race conditions between the multiple different restore endpoints that have to be used to do a (full) restore - there's the restore KV endpoint, restore SQL endpoint, and endpoints for all of the shards that the CLI orchestrates without any server-side locks.
I think what might be fairly easy to do would be to run the migrations on the restored database before swapping it in. I'll look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't too bad to run the migrations on the restored database file(s) prior to swapping out the running database. I think that should take care of any race conditions related to the restore KV/restore SQL endpoint. It might be nice someday to consolidate the restore endpoints similar to how the single metadata backup endpoint works under a single lock to prevent anything funny from happening in between CLI calls to the restore endpoints.
// operator token for annotations and notebooks would have matches the | ||
// full list of operator permissions for a 2.1 operator token, we must | ||
// be dealing with a 2.0.x operator token, so add it to the list. | ||
if permListsMatch(oprPerms, append(t.Permissions, extraPerms()...)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoying as it is, I think it'd be better to hard-code the expected final list here. Assuming we add more resource-types in versions after 2.1, the contents of oprPerms
will continue to grow as defined here. Once that happens, this conditional will no longer match for any users upgrading from 2.0
->2.N
, so the notebooks & annotations resources won't get added to their tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, hadn't thought about the case where there'd be more than one generate of additional resource types to migrate into. Made that change!
sqlite/sqlite.go
Outdated
|
||
// Now that the DB has been re-opened, release the lock so that the migrations | ||
// can be run. | ||
s.Mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as in the bolt code: are we open to a race condition here? And if so, could we refactor the migrator to avoid it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the same way as with the bolt code...run the migrations on the restored database before switching over the active database to the restored database. Refactoring the sqlite migrator to not use a lock of its own would be easier than refactoring the bolt one, but this was even easier than that still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Closes #21833
This PR creates a new KV migration for adding read/write operator-level access to the operator token for the annotations and notebooks resource type.
It also adds code to run the migrations on restored bolt & sql databases. This will ensure that restored databases are in the correct state for the version they are restored into. This makes it possible to do a full restore from a
2.0.x
version of theinfluxd
server into a2.1.x
version successfully, and have the resource types added to the restored operator token.The included unit test tests the migration. I also manually verified the following backup/restore scenarios with the latest
influx-cli
work:2.0.7
, partial restore into2.1
2.0.7
, full restore into2.1
2.1
, partial restore into2.1
2.1
, full restore into2.1