Skip to content

Conversation

@labbott
Copy link
Contributor

@labbott labbott commented Dec 4, 2025

Currently if there's a permanent error while writing to the M.2 drives we may loop/retry forever. This isn't great behavior so attempt to break if it looks like we aren't making progress writing.

@labbott
Copy link
Contributor Author

labbott commented Dec 4, 2025

I found this while working on something else, I had a bad check for a create_dir error and the unit test looped forever. I tried to turn this into a smaller unit test but that was difficult to do with the current setup and the one I tried hit a different infinite loop in installinator.

@labbott labbott force-pushed the installinator_stop_loop branch from 69d14eb to 410c0ae Compare December 4, 2025 15:50
if success_this_iter == self.drives.len() || success_prev_iter > 0 {
// 3. We had the same number of successes as the previous iteration,
// which implies that we seem to be permanetly stuck and unlikely
// to succeed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a pretty significant behavior change to the real installinator, right? I think the intent was to loop forever if we're not having success, because there's no way to restart this on failure other than aborting the entire mupdate and starting over.

That's pretty terrible for tests, though. I wonder if we should have a cap on the number of attempts (which this code implicitly has, I think, with the cap set to "2"), and allow prod to pass a cap of "loop forever" while tests can pass something much smaller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we're getting into "timeouts timeouts always wrong" territory here. I don't actually mind looping forever for tests because that is a sign I need to fix something because tests should finish. Looping forever in production actually seems worse though if there's some kind of permanent error we'll just be stuck which seems bad. Maybe wicket will give more information than tests though? If we're actually okay with the current behavior I can also just close this.

Currently if there's a permanent error while writing to the
M.2 drives we may loop/retry forever. This isn't great behavior
so attempt to break if it looks like we aren't making progress
writing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants