Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt GDP1h files when downloading Issue #531 #532

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

KevinShuman
Copy link
Collaborator

A description of the issue is laid out here in issue #531.

To address this, I'm going to create verify and fix functions that look for files with 0 bytes in the temp directory and reattempt to download them, respectively. Once the files are downloaded, a function such as to_raggedarray() can use these to verify and attempt to fix any issues with the downloaded dataset prior to attempting to create a ragged array.

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
clouddrift/adapters/utils.py 95.34% 2 Missing ⚠️
Additional details and impacted files

📢 Thoughts on this report? Let us know!

@KevinShuman
Copy link
Collaborator Author

I have taken a new approach to the problem. Instead of the verify/fix method I mentioned before, I've adjusted the logic a little in download_with_progress(). We make multiple attempts at connecting to datasets if we need to. However, as we're seeing, we can get 0 byte files for whatever reason. I changed things to start with a temporary file, drifter_hourly{id}.nc.part for example. The data are downloaded as usual, but there is a check to see if it is empty. If it is, it tries to download it again. It is set to do this 5 times at the moment. If it is not, it renames the file to the standard name.

This seems to have fixed the issue, though I am not quite sure what has changed with some of the mypy tests failing, but it doesn't seem to be related to my code changes.

os.rename(temp_output, output) # type: ignore
if os.path.exists(output): # type: ignore
os.remove(output) # type: ignore
os.rename(temp_output, output) # type: ignore
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy doesn't recognize that because temp_output is a str that output is a str. The definition of temp_out comes from determining is output is a str. I.e. if output is a str, define temp_output as a str.

Set up the mocks for the tests.
'''
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, not sure why I changed these. Might change them back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant