-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupt GDP1h files when downloading Issue #531 #532
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files📢 Thoughts on this report? Let us know! |
I have taken a new approach to the problem. Instead of the verify/fix method I mentioned before, I've adjusted the logic a little in download_with_progress(). We make multiple attempts at connecting to datasets if we need to. However, as we're seeing, we can get 0 byte files for whatever reason. I changed things to start with a temporary file, drifter_hourly{id}.nc.part for example. The data are downloaded as usual, but there is a check to see if it is empty. If it is, it tries to download it again. It is set to do this 5 times at the moment. If it is not, it renames the file to the standard name. This seems to have fixed the issue, though I am not quite sure what has changed with some of the mypy tests failing, but it doesn't seem to be related to my code changes. |
os.rename(temp_output, output) # type: ignore | ||
if os.path.exists(output): # type: ignore | ||
os.remove(output) # type: ignore | ||
os.rename(temp_output, output) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy doesn't recognize that because temp_output is a str that output is a str. The definition of temp_out comes from determining is output is a str. I.e. if output is a str, define temp_output as a str.
Set up the mocks for the tests. | ||
''' | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, not sure why I changed these. Might change them back.
A description of the issue is laid out here in issue #531.
To address this, I'm going to create verify and fix functions that look for files with 0 bytes in the temp directory and reattempt to download them, respectively. Once the files are downloaded, a function such as to_raggedarray() can use these to verify and attempt to fix any issues with the downloaded dataset prior to attempting to create a ragged array.