fix to upload folder path #198

tech3371 · 2023-12-01T20:55:34Z

Change Summary

Overview

Fix upload folder path

Updated Files

sds_data_manager/config/config.json
- update and added new product
sds_data_manager/lambda_code/SDSCode/upload_api.py
- Added fixes for upload folder path

greglucas · 2023-12-01T21:08:45Z

sds_data_manager/lambda_code/SDSCode/upload_api.py

@@ -53,6 +56,35 @@ def _check_for_matching_filetype(pattern, filename):
    return file_dictionary


+def create_path_to_upload(metadata: dict) -> str:


Do we need this new function or could you use this to replace the check_for_matching_filetype function above this? Since we have such a rigid structure now, I'm not sure we even need the config dictionary file.

mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_") # create your path from these, maybe raise if something doesn't match our requirements

check_for_matching_filetype checks that the filename follows one of the pattern in the config.json. Once it finds that format, then it stops there and continue. If we remove check_for_matching_filetype, then having config.json is not useful.

Yeah, so up to you whether you think we'll need that check/config, but now that we are more rigid in the filename requirements I'm not so sure we will was my suggestion to merge them together. Adding onto my above suggestion we could add checks from the config file directly here as well:

mission, instrument, data_level, descriptor, startdate, enddate = filename.split("_") # create your path from these, maybe raise if something doesn't match our requirements if not enddate.endswith((".pkts", ".cdf")): raise ValueError("Bad filetype, the SDC requires '.pkts' or '.cdf' filetypes to be uploaded") if instrument not in (...): raise ValueError("Acceptable instruments are: ...") # Even more validation on things if you want as well. file_path_to_upload = same_as_you_have_below

bourque

Looks good! I just had a few non-blocking suggestions.

bourque · 2023-12-04T17:49:59Z

sds_data_manager/lambda_code/SDSCode/upload_api.py

+
+    Returns
+    -------
+    str


Suggested change

str

path_to_upload_file : str

bourque · 2023-12-04T17:51:08Z

sds_data_manager/lambda_code/SDSCode/upload_api.py

+    # path to upload file follows this format:
+    # mission/instrument/data_level/descriptor/year/month/filename
+    # NOTE: year and month is from startdate and startdate format is YYYYMMDD.


I suggest to add this info to the function docstring rather than an inline comment here. I think then it could be more easily referenced within the sphinx docs.

tech3371 · 2023-12-05T16:33:16Z

@greglucas and @maxinelasp Can you review the path_helper.py to see if I captured your suggestions? After your approval, I will work towards adding tests and incorporate into indexer.py too.

greglucas · 2023-12-05T19:15:10Z

sds_data_manager/config/config.json

I think we were going to remove this file. Another benefit I thought of is that we won't need to download this onto the Lambda every single time if we put the configuration directly into the code.

Yes, True.
Yeah, we will need to remove this file. I didn't do that in this PR yet, because removing this is going to effect other CDK stack and I didn't want to bring in more changes in this PR. I plan to remove this and stacks that are associated with this in upcoming PR:

S3 bucket stack

resources that uploads this file

and may be others as well that I don't about.

sds_data_manager/lambda_code/SDSCode/path_helper.py

greglucas · 2023-12-05T19:26:01Z

sds_data_manager/lambda_code/SDSCode/path_helper.py

+        # Check if the pattern matches 8 digits (YYYYMMDD)
+        if not re.match(r"^\d{8}$", input_date):
+            return False


Do you need this, or can you just let your try/except below catch this case too?

Agree with Greg, below will catch only 8 digit times

cool. I will remove this.

sds_data_manager/lambda_code/SDSCode/path_helper.py

maxinelasp

I like your changes, they look good to me. If config.json isn't needed anymore, then it should be removed before merging.

maxinelasp · 2023-12-06T15:45:00Z

sds_data_manager/lambda_code/SDSCode/path_helper.py

+        # Check if the pattern matches 8 digits (YYYYMMDD)
+        if not re.match(r"^\d{8}$", input_date):
+            return False


Agree with Greg, below will catch only 8 digit times

maxinelasp · 2023-12-06T15:45:42Z

sds_data_manager/lambda_code/SDSCode/upload_api.py

@@ -124,3 +78,9 @@ def lambda_handler(event, context):
        }

    return {"statusCode": 200, "body": json.dumps(url)}
+
+
+if __name__ == "__main__":


Is this intentionally included? Or just for testing?

yes. just for testing. good catch!

sds_data_manager/lambda_code/SDSCode/path_helper.py

tech3371 · 2023-12-06T22:54:44Z

@bourque @maxinelasp @greglucas Can you review it again? No rush. I have to ping because this repo undoes your approval.

maxinelasp · 2023-12-07T19:20:12Z

sds_data_manager/lambda_code/SDSCode/path_helper.py


        # Validate if it's a real date
        try:
+            # This checks if date is in YYYYMMDD format.
+            # Sometimes, date is correct but not in the format we want
+            if not re.match(r"^\d{8}$", input_date):


This can just be removed, as the strptime will also fail if the input date isn't 8 digits exactly.

I did remove it originally based on feedback from you and Greg. But when I input correct date such as 2023105, strptime accepted because it took that as valid date even though its document says otherwise.
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Odd! Well, good job verifying 😆

greglucas

I'm fine with this and won't block, but I do find this harder to follow by creating a class with a bunch of methods than individual functions would be. Since others like it this way that is fine though.

def create_upload_key(fname, date_validator=is_valid_date):
    # Validate the pieces of our filename
    mission, instrument, level = fname.split()
    # call out to other validation functions
    if not date_validator(start_date):
        return False
    ...
    # assemble parts after validations

def is_valid_date(datestr, date_format="YYYYMMDD"):
    try:
        datetime.strptime(datestr, date_format)
    except ValueError:
        return False
    return True

greglucas · 2023-12-07T19:15:05Z

pyproject.toml

@@ -98,3 +98,4 @@ ignore = ["D203", "D212", "PLR0913", "PLR2004"]
 # TODO: Too many statements, this could be refactored to separate
 #       the single stack out into a few smaller pieces
 "sds_data_manager/stacks/sds_data_manager_stack.py" = ["PLR0915"]
+"sds_data_manager/lambda_code/SDSCode/path_helper.py" = ["B008"]


I don't think we should avoid this error.

greglucas · 2023-12-07T19:16:13Z

sds_data_manager/lambda_code/SDSCode/path_helper.py

+            if not re.match(r"^\d{8}$", input_date):
+                raise ValueError("Invalid date format.")


The below strptime should take care of all the regex work for you.

Suggested change

if not re.match(r"^\d{8}$", input_date):

raise ValueError("Invalid date format.")

I will remove it since you guys are sure of this. As I mentioned above, strptime takes in seven digit date as long as it's a valid date such as 2023101. I feel like I am missing something here.

let me know if you like me to remove it still.

Nope, it looks like I was wrong here! Good catch on your part and sorry for both of us mentioning you could remove it 🐑

Another option to avoid regex would be to just check the length: if len(input_date) != 8 since the following sequence will check the numbers.

true. that's much simpler.

greglucas · 2023-12-07T19:18:06Z

sds_data_manager/lambda_code/SDSCode/path_helper.py

+        except ValueError:
+            return False
+
+    def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool:


Suggested change

def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool:

def validate_filename(self, file_pattern_config=None) -> bool:

...

if file_pattern_config is None:

file_pattern_config = FilenamePatternConfig()

greglucas · 2023-12-07T19:19:11Z

sds_data_manager/lambda_code/SDSCode/path_helper.py

+            if not is_valid:
+                self.message = error_message
+                return False
+        print("done")


Suggested change

print("done")

greglucas · 2023-12-07T19:23:27Z

tests/lambda_endpoints/test_path_helper.py

+    filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
+    file_parser = FilenameParser(filename)
+    assert file_parser.check_date_input("20200101")
+    assert file_parser.check_date_input("2020-01-01") is False


Suggested change

assert file_parser.check_date_input("2020-01-01") is False

assert not file_parser.check_date_input("2020-01-01")

greglucas · 2023-12-07T19:24:41Z

tests/lambda_endpoints/test_path_helper.py

+def test_filename_validator():
+    """Validate filenames"""
+    filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
+    assert FilenameParser(filename).validate_filename() is True


comparisons should be assert something_is_true or assert not something_is_false and not comparing against the specific object.

Suggested change

assert FilenameParser(filename).validate_filename() is True

assert FilenameParser(filename).validate_filename()

greglucas · 2023-12-07T19:27:52Z

tests/lambda_endpoints/test_path_helper.py

+    """Tests date inputs"""
+    filename = "imap_glows_l0_raw_20231010_20231011_v01-01.pkts"
+    file_parser = FilenameParser(filename)
+    assert file_parser.check_date_input("20200101")


Why do you need to input a date to the fileparser? Shouldn't this be acting on self.date?

Yes. but I wrote that function to take in an input date since we need to check for both startdate and enddate. And since I wanted to check the date format only, I figure I could call the function directly instead of calling class again and again.

greglucas · 2023-12-07T19:31:38Z

tests/lambda_endpoints/test_path_helper.py

+    assert file_parser.check_date_input("2023105") is False
+
+
+def test_filename_validator():


These tests might be easier to parameterize over your assertions.
https://docs.pytest.org/en/7.1.x/example/parametrize.html

Suggested change

def test_filename_validator():

@pytest.mark.parametrize("filename,expected", [("imap_glows_l0_raw_20231010_20231011_v01-01.pkts", True), ...]

def test_filename_validator(filename, expected):

assert FilenameParser(filename).validate_filename() == expected

tech3371 requested a review from a team December 1, 2023 20:55

tech3371 self-assigned this Dec 1, 2023

tech3371 requested review from bourque, sdhoyt, greglucas, bryan-harter, laspsandoval, GFMoraga and maxinelasp and removed request for a team December 1, 2023 20:55

tech3371 linked an issue Dec 1, 2023 that may be closed by this pull request

Update upload lambda to use filename to create upload path #194

Closed

greglucas reviewed Dec 1, 2023

View reviewed changes

bourque previously approved these changes Dec 4, 2023

View reviewed changes

tech3371 dismissed bourque’s stale review via f1faef5 December 5, 2023 16:23

tech3371 force-pushed the upload_folder_path_fix branch from b1b55cc to f1faef5 Compare December 5, 2023 16:23

greglucas reviewed Dec 5, 2023

View reviewed changes

maxinelasp previously approved these changes Dec 6, 2023

View reviewed changes

tech3371 dismissed maxinelasp’s stale review via 29d075f December 6, 2023 21:22

maxinelasp reviewed Dec 7, 2023

View reviewed changes

maxinelasp approved these changes Dec 7, 2023

View reviewed changes

greglucas reviewed Dec 7, 2023

View reviewed changes

tech3371 added 7 commits December 8, 2023 16:03

fix to upload folder path

0c9ffc3

added code for checking file pattern and create upload path

8eeccb4

add doc strings

5a08ce4

feedback changes

9d624b1

added tests

14e5a97

added doc strings to tests

b30093e

feedback changes

70e2681

one minor feedback change

9b95900

tech3371 force-pushed the upload_folder_path_fix branch from ab17f6c to 9b95900 Compare December 8, 2023 23:04

tech3371 merged commit 67321d7 into IMAP-Science-Operations-Center:dev Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix to upload folder path #198

fix to upload folder path #198

tech3371 commented Dec 1, 2023

greglucas Dec 1, 2023

tech3371 Dec 1, 2023

greglucas Dec 1, 2023

bourque left a comment

bourque Dec 4, 2023

bourque Dec 4, 2023

tech3371 commented Dec 5, 2023

greglucas Dec 5, 2023

tech3371 Dec 6, 2023

greglucas Dec 5, 2023

maxinelasp Dec 6, 2023

tech3371 Dec 6, 2023

maxinelasp left a comment

maxinelasp Dec 6, 2023

maxinelasp Dec 6, 2023

tech3371 Dec 6, 2023

tech3371 commented Dec 6, 2023

maxinelasp Dec 7, 2023

tech3371 Dec 7, 2023 •

edited

Loading

maxinelasp Dec 7, 2023

greglucas left a comment

greglucas Dec 7, 2023

greglucas Dec 7, 2023

tech3371 Dec 7, 2023

tech3371 Dec 7, 2023

greglucas Dec 7, 2023

tech3371 Dec 7, 2023

greglucas Dec 7, 2023

greglucas Dec 7, 2023

greglucas Dec 7, 2023

greglucas Dec 7, 2023

greglucas Dec 7, 2023

tech3371 Dec 7, 2023

greglucas Dec 7, 2023

		@@ -53,6 +56,35 @@ def _check_for_matching_filetype(pattern, filename):
		return file_dictionary


		def create_path_to_upload(metadata: dict) -> str:

		if not re.match(r"^\d{8}$", input_date):
		raise ValueError("Invalid date format.")

-    def validate_filename(self, file_pattern_config=FilenamePatternConfig()) -> bool:
+    def validate_filename(self, file_pattern_config=None) -> bool:
+        ...
+        if file_pattern_config is None:
+            file_pattern_config = FilenamePatternConfig()

	assert file_parser.check_date_input("2020-01-01") is False
	assert not file_parser.check_date_input("2020-01-01")

	assert FilenameParser(filename).validate_filename() is True
	assert FilenameParser(filename).validate_filename()

		assert file_parser.check_date_input("2023105") is False


		def test_filename_validator():

-def test_filename_validator():
+@pytest.mark.parametrize("filename,expected", [("imap_glows_l0_raw_20231010_20231011_v01-01.pkts", True), ...]
+def test_filename_validator(filename, expected):
+    assert FilenameParser(filename).validate_filename() == expected

fix to upload folder path #198

fix to upload folder path #198

Conversation

tech3371 commented Dec 1, 2023

Change Summary

Overview

Updated Files

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bourque left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tech3371 commented Dec 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxinelasp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tech3371 commented Dec 6, 2023

Choose a reason for hiding this comment

tech3371 Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greglucas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tech3371 Dec 7, 2023 •

edited

Loading