Corrupt GDP1h files when downloading Issue #531 #532

KevinShuman · 2024-11-21T16:42:17Z

A description of the issue is laid out here in issue #531.

To address this, I'm going to create verify and fix functions that look for files with 0 bytes in the temp directory and reattempt to download them, respectively. Once the files are downloaded, a function such as to_raggedarray() can use these to verify and attempt to fix any issues with the downloaded dataset prior to attempting to create a ragged array.

codecov · 2024-11-21T16:51:48Z

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
clouddrift/adapters/utils.py	95.34%	2 Missing ⚠️

Additional details and impacted files

📢 Thoughts on this report? Let us know!

KevinShuman · 2024-12-10T16:54:01Z

I have taken a new approach to the problem. Instead of the verify/fix method I mentioned before, I've adjusted the logic a little in download_with_progress(). We make multiple attempts at connecting to datasets if we need to. However, as we're seeing, we can get 0 byte files for whatever reason. I changed things to start with a temporary file, drifter_hourly{id}.nc.part for example. The data are downloaded as usual, but there is a check to see if it is empty. If it is, it tries to download it again. It is set to do this 5 times at the moment. If it is not, it renames the file to the standard name.

This seems to have fixed the issue, though I am not quite sure what has changed with some of the mypy tests failing, but it doesn't seem to be related to my code changes.

… previously

KevinShuman · 2024-12-16T14:31:08Z

clouddrift/adapters/utils.py

-                os.rename(temp_output, output) # type: ignore
+                if os.path.exists(output):  # type: ignore
+                    os.remove(output)  # type: ignore
+                os.rename(temp_output, output)  # type: ignore


Mypy doesn't recognize that because temp_output is a str that output is a str. The definition of temp_out comes from determining is output is a str. I.e. if output is a str, define temp_output as a str.

KevinShuman · 2024-12-16T14:32:18Z

tests/adapters/utils_tests.py

        Set up the mocks for the tests.
-        '''
+        """


Frankly, not sure why I changed these. Might change them back.

Empty commit

8c4100a

KevinShuman added 8 commits November 21, 2024 10:20

Creates verify and fix functions w/ tests

8b65a76

Reformats verify and fix functions and their tests

f6ec330

Removes verify/fix functions & tests

5b7f1c4

Includes multi-attempt check for 0 byte files

5b18813

Modifies _download_with_progress() to pass tests

5d750d1

Modifies _download_with_progress() to pass mypy

25f9442

Adjusts _download_with_progress to handle buffer

476757e

Adjusts tests to work with changes with _download_with_progress

87d18ae

KevinShuman added 6 commits December 11, 2024 14:34

Ignores type errors for cut_str(), which were not resulting in errors…

9c9a1f2

… previously

Ignores type errors for cut_str(), which were not resulting in errors…

fc70232

… previously

Adds test for better coverage

6c92c02

Adds test for better coverage

d7dce77

Adds another test to improve coverage

44cbcfa

Resolves linting error

40e2292

KevinShuman commented Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrupt GDP1h files when downloading Issue #531 #532

Corrupt GDP1h files when downloading Issue #531 #532

KevinShuman commented Nov 21, 2024

codecov bot commented Nov 21, 2024 •

edited

Loading

KevinShuman commented Dec 10, 2024

KevinShuman Dec 16, 2024

KevinShuman Dec 16, 2024

Corrupt GDP1h files when downloading Issue #531 #532

Are you sure you want to change the base?

Corrupt GDP1h files when downloading Issue #531 #532

Conversation

KevinShuman commented Nov 21, 2024

codecov bot commented Nov 21, 2024 • edited Loading

Codecov Report

KevinShuman commented Dec 10, 2024

KevinShuman Dec 16, 2024

Choose a reason for hiding this comment

KevinShuman Dec 16, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 21, 2024 •

edited

Loading