-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First version implementation of preload_data and the rest of CLMS data store #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store Implementation
- The preload mechanism lacks the preload handle concept (but that's ok for time being)
- Do not limit the caching data store to
file
. Allow for anyMutableDataStore
instead.
Coding
- Using camel-case names, do not concatenate upper-case abbreviations, use camel-case instead:
CLMSAPITokenHandler
is not user friendly, useClmsApiTokenHandler
instead - Don't use
os
, usefsspec
instead. Actually, don't use specific local filesystem calls at all, if not urgently required.
Testing
- Test code is often hard to read because of the bloating mock setup
- Use mocking more carefully! It can cause architectural code changes to be very expensive in the end!
- Test with respect to correct results or behavious, not implementation.
- Build test class names like so:
${class_or_func}Test
, notTest${class_or_func}
- Using plain
assert
doesn't distinguish between expected and actual values - which makes it harder to understand what went wrong if tests fail. Prefer test classes that use assert methods fromunitest.TestCase
.
Docstrings
- Should start with a short, one-line sentence that follows immediately the
"""
and ends with a dot.
. Use active speech in this sentence. - Void methods are supposed to return
None
. Don't document this.
FYI @konstntokas
# for testing | ||
- numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you import a package in non-test code, add it as a true project dependency.
Do not rely on transitive dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I only use numpy
in the tests, so I have not added it in the project dependency.
@@ -28,8 +34,12 @@ exclude = ["test*", "doc*"] | |||
|
|||
[project.optional-dependencies] | |||
dev = [ | |||
"numpy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move into dependencies
list and potentially also others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only use numpy
in the tests, so I have not added it in the dependencies
list. By others, do you mean the pytest, black, flake8 ... etc.?
Co-authored-by: Norman Fomferra <norman.fomferra@brockmann-consult.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved (since the preload api will change) with some comments which you may address or note down for a later PR.
xcube_clms/constants.py
Outdated
PENDING = "PENDING" | ||
COMPLETE = "COMPLETE" | ||
UNDEFINED = "UNDEFINED" | ||
CANCELLED = "CANCELLED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to keep them? There was a comment in Normans review last time that this is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I understand on Norman's comment was to let the constants be here if they are used in multiple files, which is the case for these.
@@ -61,9 +60,6 @@ def get_data_store_params_schema(cls) -> JsonObjectSchema: | |||
) | |||
|
|||
params = dict( | |||
url=JsonStringSchema( | |||
title="URL of CLMS API", | |||
), | |||
credentials=JsonObjectSchema( | |||
dict(**credentials_params), | |||
required=("client_id", "user_id", "token_uri", "private_key"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the credentials json which I got from the CLMS store, there are more fields. Are the others not needed for the store? Please double check if required
is correct here.
_PORTAL_TYPE = {"portal_type": "DataSet"} | ||
_METADATA_FIELDS = "metadata_fields" | ||
_FULL_SCHEMA = "fullobjects" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I would like all the hardcoded strings to be defined as constants for easier manipulation later (if required).
xcube_clms/utils.py
Outdated
response.raise_for_status() | ||
response.raise_for_status() | ||
|
||
except HTTPError as e: | ||
# This is to make sure that the user gets to see the actual error | ||
# message which raise_for_status does not show | ||
except HTTPError: | ||
error_details = response.text | ||
if "application/json" in response.headers.get("Content-Type", "").lower(): | ||
error_details = response.text | ||
if "application/json" in response.headers.get("Content-Type", "").lower(): | ||
try: | ||
error_details = response.json() | ||
raise HTTPError(f"HTTP error {response.status_code}: {error_details}") | ||
|
||
except JSONDecodeError as e: | ||
raise JSONDecodeError(f"Invalid JSON: {e}", response.text, 0) | ||
except HTTPError as eh: | ||
raise HTTPError(f"HTTP error occurred: {eh}") | ||
except Timeout as et: | ||
raise Timeout(f"Timeout error occurred: {et}") | ||
except RequestException as e: | ||
raise RequestException(f"Request error occurred: {e}") | ||
except JSONDecodeError as json_e: | ||
LOG.error(f"Failed to decode JSON error response: {json_e}") | ||
new_error_message = ( | ||
f"HTTP error {response.status_code}: {error_details}. Original error: {e}" | ||
) | ||
LOG.error(new_error_message) | ||
raise HTTPError(new_error_message, response=e.response) from e | ||
|
||
except ( | ||
Timeout, | ||
RequestException, | ||
) as e: | ||
LOG.error(f"An error occurred during the request to {url}: {e}") | ||
raise | ||
|
||
except Exception as e: | ||
raise Exception(f"Unknown error occurred: {e}") | ||
LOG.error(f"Unknown error occurred: {e}") | ||
raise | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot you do something like this?
response = requests.request()
if response.ok:
return response # or do something else
else:
raise DataStoreError(response.raise_for_status())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be done but if I dont do this, the actual error message is lost leaving the user confused about the reason for the exception.
I have added a comment as well explaining the reason I do this.
Closing this PR as #12 refactors this PR. |
This PR introduces the initial mechanism of preloading data, including cache management, downloading, and file processing.
The following classes (components) are responsible for the mechanism:
CLMS
CacheManager
DownloadTaskManager
CLMSAPITokenHandler
FileProcessor
PreloadData
notebook.tqdm
for displaying progress barsCurrently, only logs are shown in the jupyter cells that let the user know of the request status which is updated every 60 seconds to avoid sending several requests to the CLMS API. This is configurable, so it can be changed if we need a lower or higher waiting time between request status.
tqdm
is used here that shows the progress bar for the actual download, unzipping, and postprocessing.A tutorial notebook has also been added to the
examples
folder to show how to use this data store.There are some missing tests for a few classes, but they will be added in the next PRs
Closes #3, #4, #5