-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix to upload folder path #198
fix to upload folder path #198
Conversation
@@ -53,6 +56,35 @@ def _check_for_matching_filetype(pattern, filename): | |||
return file_dictionary | |||
|
|||
|
|||
def create_path_to_upload(metadata: dict) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this new function or could you use this to replace the check_for_matching_filetype
function above this? Since we have such a rigid structure now, I'm not sure we even need the config dictionary file.
mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_")
# create your path from these, maybe raise if something doesn't match our requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_for_matching_filetype
checks that the filename follows one of the pattern in the config.json
. Once it finds that format, then it stops there and continue. If we remove check_for_matching_filetype
, then having config.json
is not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so up to you whether you think we'll need that check/config, but now that we are more rigid in the filename requirements I'm not so sure we will was my suggestion to merge them together. Adding onto my above suggestion we could add checks from the config file directly here as well:
mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_")
# create your path from these, maybe raise if something doesn't match our requirements
if not enddate.endswith((".pkts", ".cdf")):
raise ValueError("Bad filetype, the SDC requires '.pkts' or '.cdf' filetypes to be uploaded")
if instrument not in (...):
raise ValueError("Acceptable instruments are: ...")
# Even more validation on things if you want as well.
file_path_to_upload = same_as_you_have_below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just had a few non-blocking suggestions.
|
||
Returns | ||
------- | ||
str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str | |
path_to_upload_file : str |
# path to upload file follows this format: | ||
# mission/instrument/data_level/descriptor/year/month/filename | ||
# NOTE: year and month is from startdate and startdate format is YYYYMMDD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add this info to the function docstring rather than an inline comment here. I think then it could be more easily referenced within the sphinx docs.
b1b55cc
to
f1faef5
Compare
@greglucas and @maxinelasp Can you review the |
sds_data_manager/config/config.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we were going to remove this file. Another benefit I thought of is that we won't need to download this onto the Lambda every single time if we put the configuration directly into the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, True.
Yeah, we will need to remove this file. I didn't do that in this PR yet, because removing this is going to effect other CDK stack and I didn't want to bring in more changes in this PR. I plan to remove this and stacks that are associated with this in upcoming PR:
- S3 bucket stack
- resources that uploads this file
- and may be others as well that I don't about.
# Check if the pattern matches 8 digits (YYYYMMDD) | ||
if not re.match(r"^\d{8}$", input_date): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this, or can you just let your try/except below catch this case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Greg, below will catch only 8 digit times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. I will remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your changes, they look good to me. If config.json isn't needed anymore, then it should be removed before merging.
# Check if the pattern matches 8 digits (YYYYMMDD) | ||
if not re.match(r"^\d{8}$", input_date): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Greg, below will catch only 8 digit times
@@ -124,3 +78,9 @@ def lambda_handler(event, context): | |||
} | |||
|
|||
return {"statusCode": 200, "body": json.dumps(url)} | |||
|
|||
|
|||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally included? Or just for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. just for testing. good catch!
@bourque @maxinelasp @greglucas Can you review it again? No rush. I have to ping because this repo undoes your approval. |
|
||
# Validate if it's a real date | ||
try: | ||
# This checks if date is in YYYYMMDD format. | ||
# Sometimes, date is correct but not in the format we want | ||
if not re.match(r"^\d{8}$", input_date): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be removed, as the strptime
will also fail if the input date isn't 8 digits exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did remove it originally based on feedback from you and Greg. But when I input correct date such as 2023105
, strptime
accepted because it took that as valid date even though its document says otherwise.
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd! Well, good job verifying 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this and won't block, but I do find this harder to follow by creating a class with a bunch of methods than individual functions would be. Since others like it this way that is fine though.
def create_upload_key(fname, date_validator=is_valid_date):
# Validate the pieces of our filename
mission, instrument, level = fname.split()
# call out to other validation functions
if not date_validator(start_date):
return False
...
# assemble parts after validations
def is_valid_date(datestr, date_format="YYYYMMDD"):
try:
datetime.strptime(datestr, date_format)
except ValueError:
return False
return True
pyproject.toml
Outdated
@@ -98,3 +98,4 @@ ignore = ["D203", "D212", "PLR0913", "PLR2004"] | |||
# TODO: Too many statements, this could be refactored to separate | |||
# the single stack out into a few smaller pieces | |||
"sds_data_manager/stacks/sds_data_manager_stack.py" = ["PLR0915"] | |||
"sds_data_manager/lambda_code/SDSCode/path_helper.py" = ["B008"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should avoid this error.
if not re.match(r"^\d{8}$", input_date): | ||
raise ValueError("Invalid date format.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below strptime should take care of all the regex work for you.
if not re.match(r"^\d{8}$", input_date): | |
raise ValueError("Invalid date format.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it since you guys are sure of this. As I mentioned above, strptime
takes in seven digit date as long as it's a valid date such as 2023101
. I feel like I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if you like me to remove it still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it looks like I was wrong here! Good catch on your part and sorry for both of us mentioning you could remove it 🐑
Another option to avoid regex would be to just check the length: if len(input_date) != 8
since the following sequence will check the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. that's much simpler.
except ValueError: | ||
return False | ||
|
||
def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool: | |
def validate_filename(self, file_pattern_config=None) -> bool: | |
... | |
if file_pattern_config is None: | |
file_pattern_config = FilenamePatternConfig() |
if not is_valid: | ||
self.message = error_message | ||
return False | ||
print("done") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("done") |
filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts" | ||
file_parser = FilenameParser(filename) | ||
assert file_parser.check_date_input("20200101") | ||
assert file_parser.check_date_input("2020-01-01") is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert file_parser.check_date_input("2020-01-01") is False | |
assert not file_parser.check_date_input("2020-01-01") |
def test_filename_validator(): | ||
"""Validate filenames""" | ||
filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts" | ||
assert FilenameParser(filename).validate_filename() is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparisons should be assert something_is_true
or assert not something_is_false
and not comparing against the specific object.
assert FilenameParser(filename).validate_filename() is True | |
assert FilenameParser(filename).validate_filename() |
"""Tests date inputs""" | ||
filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts" | ||
file_parser = FilenameParser(filename) | ||
assert file_parser.check_date_input("20200101") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to input a date to the fileparser? Shouldn't this be acting on self.date
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. but I wrote that function to take in an input date since we need to check for both startdate and enddate. And since I wanted to check the date format only, I figure I could call the function directly instead of calling class again and again.
assert file_parser.check_date_input("2023105") is False | ||
|
||
|
||
def test_filename_validator(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests might be easier to parameterize over your assertions.
https://docs.pytest.org/en/7.1.x/example/parametrize.html
def test_filename_validator(): | |
@pytest.mark.parametrize("filename,expected", [("imap_glows_l0_raw_20231010_20231011_v01-01.pkts", True), ...] | |
def test_filename_validator(filename, expected): | |
assert FilenameParser(filename).validate_filename() == expected |
ab17f6c
to
9b95900
Compare
Change Summary
Overview
Fix upload folder path
Updated Files