Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1026 logical source parser #1044

Conversation

subagonsouth
Copy link
Contributor

@subagonsouth subagonsouth commented Oct 22, 2024

Change Summary

Overview

  • Added function to parse logical source into components
  • Added function to parse sensor number from string

Updated Files

  • imap_processing/hi/l1b/hi_l1b.py
    • Use new parser function
  • imap_processing/hi/l1c/hi_l1c.py
    • Use new parser function
  • imap_processing/hi/utils.py
    • Add utility function for parsing filename like strings such as logical source
    • Add function for parsing integer sensor number from string
  • imap_processing/tests/hi/test_utils.py
    • Add test coverage for two new functions

Closes: #1026

@subagonsouth subagonsouth added the Ins: Hi Related to the IMAP-Hi instrument label Oct 22, 2024
@subagonsouth subagonsouth self-assigned this Oct 22, 2024
Comment on lines 97 to 107
regex_str = (
r"^(?P<mission>imap)_" # Required mission
r"(?P<instrument>[^_]+)_" # Required instrument
r"(?P<data_level>[^_]+)_" # Required data level
r"((?P<sensor>\d{2}sensor)?-)?" # Optional sensor number
r"(?P<descriptor>[^_]+)" # Required descriptor
r"(_(?P<start_date>\d{8}))?" # Optional start date
r"(-repoint(?P<repointing>\d{5}))?" # Optional repointing field
r"(?:_v(?P<version>\d{3}))?" # Optional version
r"(?:\.(?P<extension>cdf|pkts))?$" # Optional extension
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need other fields or can you do something similar to this and then apply regex on descriptor? I am afraid of losing track of this in the future. I am wondering now if it's an feature we should implement in imap-data-access, especially update this function.

Suggested change
regex_str = (
r"^(?P<mission>imap)_" # Required mission
r"(?P<instrument>[^_]+)_" # Required instrument
r"(?P<data_level>[^_]+)_" # Required data level
r"((?P<sensor>\d{2}sensor)?-)?" # Optional sensor number
r"(?P<descriptor>[^_]+)" # Required descriptor
r"(_(?P<start_date>\d{8}))?" # Optional start date
r"(-repoint(?P<repointing>\d{5}))?" # Optional repointing field
r"(?:_v(?P<version>\d{3}))?" # Optional version
r"(?:\.(?P<extension>cdf|pkts))?$" # Optional extension
)
regex_str = (
r"((?P<sensor>\d{2}sensor)?-)?" # Optional sensor number
r"(?P<descriptor>[^_]+)" # Required descriptor
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps it is overkill to parse all of the fields. I doubt I will need the mission or instrument field but I could see needing the start_date or repoint. I based this regex on the one in the ScienceFilePath class. I don't think that one should be modified because it needs to be very strict to have invalid file names fail where I don't want to be strict. Parsing using split() could be done but it is lots of hard coding indices and gets messy with some of the optional fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like to keep it as it is, that's ok with me. I do want @laspsandoval and @sdhoyt to review these two function especially and see if they can benefit from it. If so, then we may want to move this in common utils function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't come across a situation where I need to parse anything from the logical source. I agree with Tim though about the start_date and pointing. I could see parsing those two fields being useful for instruments that need to reprocess because there's data spread across CCSDS files for two pointings and they need to put some bits of data from Day 2 into the Day 1 data product. In that case I think there would be multiple L0 dependency files with different dates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tech3371, I decided to move this function into the cdf/utils.py module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim, that's a good place to put it since it's related to CDF attrs. Cool. I will approve this PR then. Then if others find that function helpful, it's there to use.

},
),
(
"imap_hi_l1c_90sensor-pset_20250415_v001.cdf",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a pointing number to this test case if the regex parsing stays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be helpful.

@subagonsouth subagonsouth force-pushed the 1026-logical-source-parser branch from 94324f5 to bb4115e Compare October 22, 2024 22:58
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the previous comment about may be moving it to common utils, everything else looks good to me. I will let Sean and Laura approve this PR.

},
),
(
"imap_hi_l1c_90sensor-pset_20250415_v001.cdf",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be helpful.

@subagonsouth subagonsouth force-pushed the 1026-logical-source-parser branch from bb4115e to 9a4b6e9 Compare October 28, 2024 22:31
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test with repointing number in the filename like you were thinking? Otherwise, it looks good to me.

@subagonsouth subagonsouth merged commit 978bae9 into IMAP-Science-Operations-Center:dev Oct 29, 2024
17 checks passed
@subagonsouth subagonsouth deleted the 1026-logical-source-parser branch October 29, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ins: Hi Related to the IMAP-Hi instrument
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Logical source parser
3 participants