-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry downloading the boot image if it fails non-prod environment #6
base: main
Are you sure you want to change the base?
Conversation
The boot image download can fail due to various reasons, such as network. We don't want it to cause failure in our E2E tests. This commit adds a retry logic to the boot image download in the E2E tests. I didn't add the retry logic for production case, because we might want to have a deeper look before blindly retrying.
If the image download fails, the temp file is left behind, which breaks the idempotency of the image download. This commit adds a cleanup step to remove the temp file if the download fails.
If two concurrent processes try to write to the same file, the second one would be able to open the temp file and would wait until the first one finishes, but the first one renames the temp file, which leaves the second process in a weird state. We fixed this problem in control plane's safe_write_to_file method, but we didn't apply the fix in the data plane version of it. This commit fixes the problem in the data plane version of the method by locking a lock file before opening the temp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
- Introduced retry logic for boot image downloads in non-production environments
- Ensured cleanup of temporary files if download fails
- Maintained idempotency by removing leftover temp files
- Added retry mechanism to improve E2E test reliability
- Changes do not affect production environments
4 file(s) reviewed, 1 comment(s)
else | ||
f.puts(content) | ||
File.write(temp_filename, content) | ||
end | ||
File.rename(temp_filename, filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for File.rename
to ensure the temporary file is cleaned up in case of failure.
this is the reference pr: ubicloud#1682 |
Retry downloading the boot image if it fails non-prod environment
The boot image download can fail due to various reasons, such as network. We
don't want it to cause failure in our E2E tests. This commit adds a retry logic
to the boot image download in the E2E tests. I didn't add the retry logic for
production case, because we might want to have a deeper look before blindly
retrying.
Clean up the temp file if image download fails
If the image download fails, the temp file is left behind, which breaks
the idempotency of the image download. This commit adds a cleanup step
to remove the temp file if the download fails.