-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21
NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21
Conversation
…ediction code, to allow it to also be used in file checks
gnssanalysis/filenames.py
Outdated
return generate_IGS_long_filename(**name_properties) | ||
|
||
|
||
def determine_file_props(file_path: pathlib.Path, defaults: Dict[str, Any], overrides: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be: 'determine_file_properties`? Just to follow the general convention of expanding out all abbreviations
gnssanalysis/filenames.py
Outdated
@@ -722,6 +753,35 @@ def determine_name_props_from_filename(filename: str) -> Dict[str, Any]: | |||
} | |||
|
|||
|
|||
def warn_on_unexpected_filename(input_file: pathlib.Path) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring would be good here
gnssanalysis/filenames.py
Outdated
f"didn't match expected: '{expected_file_name}'. " | ||
"Contents may be incorrect and lead to failures." | ||
) | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about the name of the function warn_on...
, should this return as True
when it does find an issue? (i.e. when it should warn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would concur and further suggest that warning is not really the responsibility of the library code. Detecting differences is probably in scope but warning is something that should be done in application code (like the ops comparison system).
Given that I'm wondering if this file should either just compare two names (at which point it's relatively simple and maybe could disappear?) or maybe tell you exactly which name properties differ between what's detected and what the current name is (if we can extract properties from the filename)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. Detecting which bit differed is relatively easy given the properties dict...
Would you surface that information through an exception, if not a log message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just return the information in some format (maybe a possible empty list of items that disagree). This is library code and so in my head these tests are equivalent to specialised versions of ==
and you wouldn't ever expect a comparison operator to log or throw an exception, it's just making a comparison and then what to do about that comparison is really up to application code, in our case scripts in ginan
or in the operations code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restructured to provide a discrepancy checker function here in gnssanalysis. That is now leveraged by a wrapper function in the other code which logs errors in that context. Makes it more generic, and also more streamlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is not leveraged in other codebases at present. So for now the new and changed functionality appears unused.
gnssanalysis/filenames.py
Outdated
|
||
|
||
def check_file_timespan_as_claimed(input_file: pathlib.Path) -> bool: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring
gnssanalysis/filenames.py
Outdated
return generate_IGS_long_filename(**name_properties) | ||
|
||
|
||
def determine_file_props(file_path: pathlib.Path, defaults: Dict[str, Any], overrides: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaults
and overrides
can and probably should default to empty dictionaries (this is true for the parent function as well). To avoid mutable-default issues it probably then needs a defaults = None
if defaults is None: defaults = {}
idiom.
In hindsight the type can probably just be a Optional[Mapping[str, Any]]
and I can't remember whether that Any
can be tightened up further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
gnssanalysis/filenames.py
Outdated
return False # Assume it's bad, as we can't verify it's good. | ||
claimed_timespan:datetime.timedelta = determine_name_props_from_filename(input_file.name)["timespan"] | ||
|
||
if claimed_timespan != actual_timespan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, I feel slightly uncomfortable around the logging messages in library code and the assumptions made for unsupported files. These have the feeling of being domain/application specific. If you then remove that logging and catching, this function boils down to a slightly nicer return determine_file_props(input_file)["timespan"] == determine_name_props_from_filename(input_file.name)["timespan"]
and I'm not sure whether that's justifying its existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, see above comment.
…format spec violations
…n based on PR feedback
… properties function. Make sampling rate calculation consistent by disabling duplicated logic on one code path.
This PR hasn't been touched in a while. @treefern it looks like you've resolved all of @jashlearn 's concerns so I would just resolved the conflicts that now exist between this branch and |
gnssanalysis/filenames.py
Outdated
@@ -7,7 +7,7 @@ | |||
|
|||
# The collections.abc (rather than typing) versions don't support subscripting until 3.9 | |||
# from collections import Iterable | |||
from typing import Iterable | |||
from typing import Iterable, Mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line, and the one below are "from typing", can be merged
…check function, to allow it to account for files with one epoch less than a full timespan worth of data.
…eck to use sampling_rate_seconds not the string representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the looks of it, the main change is a refactor determine_name_props_from_filename()
to determine_properties_from_filename()
and splitting out the determine_file_name()
function to include a new function called determine_properties_from_contents()
There are a couple functions introduced that aren't used anywhere as well.
@treefern If you've tested this and all seems to work as before, it might be good to write up unittests for functions touched by this PR
…e imported anywhere it is needed.
…rom contents or name. A bit more work could be done here
Initial set of unit tests added. One undiscovered bug was unearthed in the process, and fixed. It would be good to review what values should be returned by these functions when a definitive answer can't be found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in a good enough state to push through. We'll need to sort out the values for the unit-tests but we can do that on the next iteration
|
||
derived_filename = filenames.determine_file_name(test_sp3_file) | ||
|
||
# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that it is coming out as FIL
because those are the first three characters of the input filename: file1.sp3
. Generally the first three characters are for the AC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd. | |
# Computed at time of writing. Note that Analysis Centre is derived from the first three chars | |
# of the filename, which is why we ended up with `FIL` here. |
derived_filename = filenames.determine_file_name(test_sp3_file) | ||
|
||
# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd. | ||
expected_filename = "FIL0EXP_20242010000_05M_05M_ORB.SP3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EXP
I am not sure of - perhaps because there is no further info in the filename, it assumes an experimental file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together so quickly! It looks good enough to me. We can work out the proper values for the unit-tests in the next iteration of PRs for this
Multiple issues identified after merging this. These should have no impact as the relevant code wasn't yet utilised. New PR to follow |
This PR splits out file properties extraction code from filename prediction code, to allow it to also be used in other checks (e.g. to check whether the timespan that a filename implies, matches what is found in the file).