Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix secondary root rotation on metadata expiry #115

Merged
merged 6 commits into from
Aug 13, 2024

Conversation

cajun-rat
Copy link
Collaborator

This fixes a bug that occurs when trying to update a secondary after the root metadata expires. There is a reproducer for the failure followed by a fix.

Thankfully, rebooting the system allows it to recover so we got lucky here.

The fix is already in the Toradex fork, bringing it over here too.

The problem was that the first call to checkMetaOffline() throws an exception
which results in image_repo_->checkMetaOffline() not being called.

There are a bunch of root causes that should be fixed at some point:
 - The *Repository classes are not fully initialized in the ctor
 - The required initialization is called 'checkMetaOffline()''
 - There is no documentation that says this is so
 - There are no asserts to enforce the above requirement at runtime.
There only one place that uses this default, so remove the default ctor and
explicitly express it in RepositoryCommon. This means that no-one has to
remember "What is the policy for a default-constructed Root object?"" any more.
This wasn't implicated in TOR-3452, but the new version is shorter and has
simple control flow.
@cajun-rat cajun-rat force-pushed the fix/secondary-root-rotation branch from f28bc0d to bda0573 Compare August 3, 2024 09:36
Copy link
Collaborator

@pattivacek pattivacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look since this was some of the last code I was involved in substantially reworking. Looks good, I think!

src/virtual_secondary/bad_rotation_test.cc Show resolved Hide resolved
image_repo_->checkMetaOffline(*storage_);
} catch (const std::exception &e) {
LOG_INFO << "No valid metadata found in storage.";
// See above ^ image_repo_ is OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "OK" is a bit vague. Maybe should just say that it's stale? (Same with the comment above, really.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. 'OK' hints at a deeper problem with the types of state that DirectorRepository and ImageRepository and get into. After constructing they need someone to call checkMetaOffline() to get them into a state where they can accept updates. If that call throws and exception, then the thing is still in a state where it can receive updates, as long as someone starts by rotating the root metadata. We can/should tidy this up, but I didn't want to make sweeping changes when there was such a small fix available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty out of the loop by this point but agree that this can definitely be done differently. Up to you but the comments wouldn't hurt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll add some more description to the docs for checkMetaOffline(). I think I've described the bug a few times now, it is probably worth a definitive description

Copy link
Collaborator

@rborn-tx rborn-tx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was reviewed earlier at Toradex so I'll just say that it looks good to me and I'll approve it since we already have this in Toradex' fork.

Minor question though: I see some commit message titles referencing a ticket number internal to Toradex (TOR-xxxx). Is that okay for the upstream project?

cajun-rat added a commit that referenced this pull request Aug 13, 2024
See discussion in #115

f#	../fail_state.tmpwb569uon/
@cajun-rat
Copy link
Collaborator Author

I created #118 to improve the documentation.

@cajun-rat cajun-rat merged commit 80a777d into master Aug 13, 2024
5 checks passed
@cajun-rat cajun-rat deleted the fix/secondary-root-rotation branch August 13, 2024 18:39
cajun-rat added a commit that referenced this pull request Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants