Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry downloading the boot image if it fails non-prod environment #6

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

soohoonc
Copy link
Member

Retry downloading the boot image if it fails non-prod environment
The boot image download can fail due to various reasons, such as network. We
don't want it to cause failure in our E2E tests. This commit adds a retry logic
to the boot image download in the E2E tests. I didn't add the retry logic for
production case, because we might want to have a deeper look before blindly
retrying.

Clean up the temp file if image download fails
If the image download fails, the temp file is left behind, which breaks
the idempotency of the image download. This commit adds a cleanup step
to remove the temp file if the download fails.

byucesoy added 3 commits June 20, 2024 14:41
The boot image download can fail due to various reasons, such as network. We
don't want it to cause failure in our E2E tests. This commit adds a retry logic
to the boot image download in the E2E tests. I didn't add the retry logic for
production case, because we might want to have a deeper look before blindly
retrying.
If the image download fails, the temp file is left behind, which breaks
the idempotency of the image download. This commit adds a cleanup step
to remove the temp file if the download fails.
If two concurrent processes try to write to the same file, the second one would
be able to open the temp file and would wait until the first one finishes, but
the first one renames the temp file, which leaves the second process in a weird
state. We fixed this problem in control plane's safe_write_to_file method, but
we didn't apply the fix in the data plane version of it. This commit fixes the
problem in the data plane version of the method by locking a lock file before
opening the temp file.
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

  • Introduced retry logic for boot image downloads in non-production environments
  • Ensured cleanup of temporary files if download fails
  • Maintained idempotency by removing leftover temp files
  • Added retry mechanism to improve E2E test reliability
  • Changes do not affect production environments

4 file(s) reviewed, 1 comment(s)

else
f.puts(content)
File.write(temp_filename, content)
end
File.rename(temp_filename, filename)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for File.rename to ensure the temporary file is cleaned up in case of failure.

@soohoonc
Copy link
Member Author

this is the reference pr: ubicloud#1682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants