Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix to upload folder path #198

Merged

Conversation

tech3371
Copy link
Contributor

@tech3371 tech3371 commented Dec 1, 2023

Change Summary

Overview

Fix upload folder path

Updated Files

  • sds_data_manager/config/config.json
    • update and added new product
  • sds_data_manager/lambda_code/SDSCode/upload_api.py
    • Added fixes for upload folder path

@tech3371 tech3371 requested a review from a team December 1, 2023 20:55
@tech3371 tech3371 self-assigned this Dec 1, 2023
@tech3371 tech3371 requested review from bourque, sdhoyt, greglucas, bryan-harter, laspsandoval, GFMoraga and maxinelasp and removed request for a team December 1, 2023 20:55
@tech3371 tech3371 linked an issue Dec 1, 2023 that may be closed by this pull request
@@ -53,6 +56,35 @@ def _check_for_matching_filetype(pattern, filename):
return file_dictionary


def create_path_to_upload(metadata: dict) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new function or could you use this to replace the check_for_matching_filetype function above this? Since we have such a rigid structure now, I'm not sure we even need the config dictionary file.

mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_")
# create your path from these, maybe raise if something doesn't match our requirements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_for_matching_filetype checks that the filename follows one of the pattern in the config.json. Once it finds that format, then it stops there and continue. If we remove check_for_matching_filetype, then having config.json is not useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so up to you whether you think we'll need that check/config, but now that we are more rigid in the filename requirements I'm not so sure we will was my suggestion to merge them together. Adding onto my above suggestion we could add checks from the config file directly here as well:

mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_")
# create your path from these, maybe raise if something doesn't match our requirements
if not enddate.endswith((".pkts", ".cdf")):
    raise ValueError("Bad filetype, the SDC requires '.pkts' or '.cdf' filetypes to be uploaded")
if instrument not in (...):
    raise ValueError("Acceptable instruments are: ...")
# Even more validation on things if you want as well.
file_path_to_upload = same_as_you_have_below

bourque
bourque previously approved these changes Dec 4, 2023
Copy link
Collaborator

@bourque bourque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just had a few non-blocking suggestions.


Returns
-------
str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
str
path_to_upload_file : str

Comment on lines 73 to 75
# path to upload file follows this format:
# mission/instrument/data_level/descriptor/year/month/filename
# NOTE: year and month is from startdate and startdate format is YYYYMMDD.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add this info to the function docstring rather than an inline comment here. I think then it could be more easily referenced within the sphinx docs.

@tech3371
Copy link
Contributor Author

tech3371 commented Dec 5, 2023

@greglucas and @maxinelasp Can you review the path_helper.py to see if I captured your suggestions? After your approval, I will work towards adding tests and incorporate into indexer.py too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were going to remove this file. Another benefit I thought of is that we won't need to download this onto the Lambda every single time if we put the configuration directly into the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, True.
Yeah, we will need to remove this file. I didn't do that in this PR yet, because removing this is going to effect other CDK stack and I didn't want to bring in more changes in this PR. I plan to remove this and stacks that are associated with this in upcoming PR:

  • S3 bucket stack
  • resources that uploads this file
  • and may be others as well that I don't about.

Comment on lines 95 to 97
# Check if the pattern matches 8 digits (YYYYMMDD)
if not re.match(r"^\d{8}$", input_date):
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this, or can you just let your try/except below catch this case too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Greg, below will catch only 8 digit times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. I will remove this.

maxinelasp
maxinelasp previously approved these changes Dec 6, 2023
Copy link
Contributor

@maxinelasp maxinelasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your changes, they look good to me. If config.json isn't needed anymore, then it should be removed before merging.

Comment on lines 95 to 97
# Check if the pattern matches 8 digits (YYYYMMDD)
if not re.match(r"^\d{8}$", input_date):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Greg, below will catch only 8 digit times

@@ -124,3 +78,9 @@ def lambda_handler(event, context):
}

return {"statusCode": 200, "body": json.dumps(url)}


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally included? Or just for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. just for testing. good catch!

@tech3371
Copy link
Contributor Author

tech3371 commented Dec 6, 2023

@bourque @maxinelasp @greglucas Can you review it again? No rush. I have to ping because this repo undoes your approval.


# Validate if it's a real date
try:
# This checks if date is in YYYYMMDD format.
# Sometimes, date is correct but not in the format we want
if not re.match(r"^\d{8}$", input_date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be removed, as the strptime will also fail if the input date isn't 8 digits exactly.

Copy link
Contributor Author

@tech3371 tech3371 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did remove it originally based on feedback from you and Greg. But when I input correct date such as 2023105, strptime accepted because it took that as valid date even though its document says otherwise.
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd! Well, good job verifying 😆

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this and won't block, but I do find this harder to follow by creating a class with a bunch of methods than individual functions would be. Since others like it this way that is fine though.

def create_upload_key(fname, date_validator=is_valid_date):
    # Validate the pieces of our filename
    mission, instrument, level = fname.split()
    # call out to other validation functions
    if not date_validator(start_date):
        return False
    ...
    # assemble parts after validations

def is_valid_date(datestr, date_format="YYYYMMDD"):
    try:
        datetime.strptime(datestr, date_format)
    except ValueError:
        return False
    return True

pyproject.toml Outdated
@@ -98,3 +98,4 @@ ignore = ["D203", "D212", "PLR0913", "PLR2004"]
# TODO: Too many statements, this could be refactored to separate
# the single stack out into a few smaller pieces
"sds_data_manager/stacks/sds_data_manager_stack.py" = ["PLR0915"]
"sds_data_manager/lambda_code/SDSCode/path_helper.py" = ["B008"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should avoid this error.

Comment on lines 101 to 102
if not re.match(r"^\d{8}$", input_date):
raise ValueError("Invalid date format.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below strptime should take care of all the regex work for you.

Suggested change
if not re.match(r"^\d{8}$", input_date):
raise ValueError("Invalid date format.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it since you guys are sure of this. As I mentioned above, strptime takes in seven digit date as long as it's a valid date such as 2023101. I feel like I am missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if you like me to remove it still.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it looks like I was wrong here! Good catch on your part and sorry for both of us mentioning you could remove it 🐑

Another option to avoid regex would be to just check the length: if len(input_date) != 8 since the following sequence will check the numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. that's much simpler.

except ValueError:
return False

def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool:
def validate_filename(self, file_pattern_config=None) -> bool:
...
if file_pattern_config is None:
file_pattern_config = FilenamePatternConfig()

if not is_valid:
self.message = error_message
return False
print("done")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("done")

filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
file_parser = FilenameParser(filename)
assert file_parser.check_date_input("20200101")
assert file_parser.check_date_input("2020-01-01") is False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert file_parser.check_date_input("2020-01-01") is False
assert not file_parser.check_date_input("2020-01-01")

def test_filename_validator():
"""Validate filenames"""
filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
assert FilenameParser(filename).validate_filename() is True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparisons should be assert something_is_true or assert not something_is_false and not comparing against the specific object.

Suggested change
assert FilenameParser(filename).validate_filename() is True
assert FilenameParser(filename).validate_filename()

"""Tests date inputs"""
filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
file_parser = FilenameParser(filename)
assert file_parser.check_date_input("20200101")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to input a date to the fileparser? Shouldn't this be acting on self.date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. but I wrote that function to take in an input date since we need to check for both startdate and enddate. And since I wanted to check the date format only, I figure I could call the function directly instead of calling class again and again.

assert file_parser.check_date_input("2023105") is False


def test_filename_validator():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests might be easier to parameterize over your assertions.
https://docs.pytest.org/en/7.1.x/example/parametrize.html

Suggested change
def test_filename_validator():
@pytest.mark.parametrize("filename,expected", [("imap_glows_l0_raw_20231010_20231011_v01-01.pkts", True), ...]
def test_filename_validator(filename, expected):
assert FilenameParser(filename).validate_filename() == expected

@tech3371 tech3371 force-pushed the upload_folder_path_fix branch from ab17f6c to 9b95900 Compare December 8, 2023 23:04
@tech3371 tech3371 merged commit 67321d7 into IMAP-Science-Operations-Center:dev Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Update upload lambda to use filename to create upload path
4 participants