Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrations for annotations & notebooks resource types on operator token #21840

Merged
merged 9 commits into from
Jul 14, 2021

Conversation

williamhbaker
Copy link
Contributor

Closes #21833

This PR creates a new KV migration for adding read/write operator-level access to the operator token for the annotations and notebooks resource type.

It also adds code to run the migrations on restored bolt & sql databases. This will ensure that restored databases are in the correct state for the version they are restored into. This makes it possible to do a full restore from a 2.0.x version of the influxd server into a 2.1.x version successfully, and have the resource types added to the restored operator token.

The included unit test tests the migration. I also manually verified the following backup/restore scenarios with the latest influx-cli work:

  • Backup from 2.0.7, partial restore into 2.1
  • Backup from 2.0.7, full restore into 2.1
  • Backup from 2.1, partial restore into 2.1
  • Backup from 2.1, full restore into 2.1

bolt/kv.go Outdated
// swapAndOpenRestored is used while restoring a database to close the existing
// database, replace the existing database with the restored database at the
// temp path, and then re-open the restored database.
func (s *KVStore) swapAndOpenRestored() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was re-factored from existing code into a separate function to make it cleaner to manage the mutex locking and unlocking.

sqlite/sqlite.go Outdated
@@ -230,14 +231,38 @@ func (s *SqlStore) RestoreSqlStore(ctx context.Context, r io.Reader) error {
return err
}

// Close the current DB.
// Close the current DB and run the restore while under lock.
s.Mu.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have locks here before but it seems like we really should be getting a lock prior to doing the sqlite database restoration as well so I adapted similar logic as in the bolt kv restore code for locking and unlocking.

sqlite/sqlite.go Outdated
// swapAndOpenRestored is used while restoring a database to close the existing
// database, restore the backed up database, and then re-open the restored
// database.
func (s *SqlStore) runSqlRestore(ctx context.Context, tempFileName string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above in the bolt kv restore file, this function was adapted from existing code to make it easier to manage the lock.

@danxmoran danxmoran self-requested a review July 13, 2021 17:57
bolt/kv.go Outdated
// Atomically swap temporary file with current DB file.
if err := fs.RenameFileWithReplacement(s.tempPath(), s.path); err != nil {
// Now that the DB has been re-opened, release the lock so that the
// migrations can be run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a potential opening for race conditions? Unless there's some higher-level lock protecting this whole thing.

If the migrator is currently coded to take a lock, would it be possible/better to refactor it to remove that step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely some potential for a race condition here & not a higher level lock. My initial thought was that it wouldn't be much worse than the potential race conditions between the multiple different restore endpoints that have to be used to do a (full) restore - there's the restore KV endpoint, restore SQL endpoint, and endpoints for all of the shards that the CLI orchestrates without any server-side locks.

I think what might be fairly easy to do would be to run the migrations on the restored database before swapping it in. I'll look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't too bad to run the migrations on the restored database file(s) prior to swapping out the running database. I think that should take care of any race conditions related to the restore KV/restore SQL endpoint. It might be nice someday to consolidate the restore endpoints similar to how the single metadata backup endpoint works under a single lock to prevent anything funny from happening in between CLI calls to the restore endpoints.

// operator token for annotations and notebooks would have matches the
// full list of operator permissions for a 2.1 operator token, we must
// be dealing with a 2.0.x operator token, so add it to the list.
if permListsMatch(oprPerms, append(t.Permissions, extraPerms()...)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying as it is, I think it'd be better to hard-code the expected final list here. Assuming we add more resource-types in versions after 2.1, the contents of oprPerms will continue to grow as defined here. Once that happens, this conditional will no longer match for any users upgrading from 2.0->2.N, so the notebooks & annotations resources won't get added to their tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, hadn't thought about the case where there'd be more than one generate of additional resource types to migrate into. Made that change!

sqlite/sqlite.go Outdated

// Now that the DB has been re-opened, release the lock so that the migrations
// can be run.
s.Mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as in the bolt code: are we open to a race condition here? And if so, could we refactor the migrator to avoid it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the same way as with the bolt code...run the migrations on the restored database before switching over the active database to the restored database. Refactoring the sqlite migrator to not use a lock of its own would be easier than refactoring the bolt one, but this was even easier than that still.

Copy link
Contributor

@danxmoran danxmoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotations/Notebooks token permissions issue when transitioning from 2.0.x -> 2.1.x
2 participants