NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21

treefern · 2024-05-07T05:35:13Z

This PR splits out file properties extraction code from filename prediction code, to allow it to also be used in other checks (e.g. to check whether the timespan that a filename implies, matches what is found in the file).

…ediction code, to allow it to also be used in file checks

ronaldmaj · 2024-05-07T05:59:56Z

gnssanalysis/filenames.py

+    return generate_IGS_long_filename(**name_properties)
+
+
+def determine_file_props(file_path: pathlib.Path, defaults: Dict[str, Any], overrides: Dict[str, Any]) -> Dict[str, Any]:


This could be: 'determine_file_properties`? Just to follow the general convention of expanding out all abbreviations

ronaldmaj · 2024-05-07T06:00:53Z

gnssanalysis/filenames.py

@@ -722,6 +753,35 @@ def determine_name_props_from_filename(filename: str) -> Dict[str, Any]:
    }


+def warn_on_unexpected_filename(input_file: pathlib.Path) -> bool:


A docstring would be good here

ronaldmaj · 2024-05-07T06:02:14Z

gnssanalysis/filenames.py

+                       f"didn't match expected: '{expected_file_name}'. "
+                       "Contents may be incorrect and lead to failures."
+                       )
+        return False


Just thinking about the name of the function warn_on..., should this return as True when it does find an issue? (i.e. when it should warn)

I would concur and further suggest that warning is not really the responsibility of the library code. Detecting differences is probably in scope but warning is something that should be done in application code (like the ops comparison system).
Given that I'm wondering if this file should either just compare two names (at which point it's relatively simple and maybe could disappear?) or maybe tell you exactly which name properties differ between what's detected and what the current name is (if we can extract properties from the filename)?

Good points. Detecting which bit differed is relatively easy given the properties dict...
Would you surface that information through an exception, if not a log message?

I would just return the information in some format (maybe a possible empty list of items that disagree). This is library code and so in my head these tests are equivalent to specialised versions of == and you wouldn't ever expect a comparison operator to log or throw an exception, it's just making a comparison and then what to do about that comparison is really up to application code, in our case scripts in ginan or in the operations code.

Restructured to provide a discrepancy checker function here in gnssanalysis. That is now leveraged by a wrapper function in the other code which logs errors in that context. Makes it more generic, and also more streamlined.

Looks like it is not leveraged in other codebases at present. So for now the new and changed functionality appears unused.

ronaldmaj · 2024-05-07T06:02:23Z

gnssanalysis/filenames.py

+
+
+def check_file_timespan_as_claimed(input_file: pathlib.Path) -> bool:
+    try:


jashlearn · 2024-05-07T06:28:20Z

gnssanalysis/filenames.py

+    return generate_IGS_long_filename(**name_properties)
+
+
+def determine_file_props(file_path: pathlib.Path, defaults: Dict[str, Any], overrides: Dict[str, Any]) -> Dict[str, Any]:


defaults and overrides can and probably should default to empty dictionaries (this is true for the parent function as well). To avoid mutable-default issues it probably then needs a defaults = None if defaults is None: defaults = {} idiom.
In hindsight the type can probably just be a Optional[Mapping[str, Any]] and I can't remember whether that Any can be tightened up further.

jashlearn · 2024-05-07T06:35:24Z

gnssanalysis/filenames.py

+        return False # Assume it's bad, as we can't verify it's good.
+    claimed_timespan:datetime.timedelta = determine_name_props_from_filename(input_file.name)["timespan"]
+
+    if claimed_timespan != actual_timespan:


Similar to the above, I feel slightly uncomfortable around the logging messages in library code and the assumptions made for unsupported files. These have the feeling of being domain/application specific. If you then remove that logging and catching, this function boils down to a slightly nicer return determine_file_props(input_file)["timespan"] == determine_name_props_from_filename(input_file.name)["timespan"] and I'm not sure whether that's justifying its existence?

Updated, see above comment.

…format spec violations

…estion

…cific code.

…n based on PR feedback

… properties function. Make sampling rate calculation consistent by disabling duplicated logic on one code path.

ronaldmaj · 2024-07-30T03:11:49Z

This PR hasn't been touched in a while. @treefern it looks like you've resolved all of @jashlearn 's concerns so I would just resolved the conflicts that now exist between this branch and main and we can probably push this through as well

seballgeyer · 2024-07-30T12:29:34Z

gnssanalysis/filenames.py

@@ -7,7 +7,7 @@

 # The collections.abc (rather than typing) versions don't support subscripting until 3.9
 # from collections import Iterable
-from typing import Iterable
+from typing import Iterable, Mapping


this line, and the one below are "from typing", can be merged

…check function, to allow it to account for files with one epoch less than a full timespan worth of data.

…eck to use sampling_rate_seconds not the string representation

…ve clarity

ronaldmaj

From the looks of it, the main change is a refactor determine_name_props_from_filename() to determine_properties_from_filename() and splitting out the determine_file_name() function to include a new function called determine_properties_from_contents()

There are a couple functions introduced that aren't used anywhere as well.
@treefern If you've tested this and all seems to work as before, it might be good to write up unittests for functions touched by this PR

…e imported anywhere it is needed.

…rom contents or name. A bit more work could be done here

treefern · 2024-08-02T10:55:54Z

Initial set of unit tests added. One undiscovered bug was unearthed in the process, and fixed.

It would be good to review what values should be returned by these functions when a definitive answer can't be found.

ronaldmaj

I think this is in a good enough state to push through. We'll need to sort out the values for the unit-tests but we can do that on the next iteration

ronaldmaj · 2024-08-02T12:18:59Z

tests/test_filenames.py

+
+        derived_filename = filenames.determine_file_name(test_sp3_file)
+
+        # Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd.


Seems to me that it is coming out as FIL because those are the first three characters of the input filename: file1.sp3. Generally the first three characters are for the AC

Suggested change

# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd.

# Computed at time of writing. Note that Analysis Centre is derived from the first three chars

# of the filename, which is why we ended up with `FIL` here.

ronaldmaj · 2024-08-02T12:19:53Z

tests/test_filenames.py

+        derived_filename = filenames.determine_file_name(test_sp3_file)
+
+        # Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd.
+        expected_filename = "FIL0EXP_20242010000_05M_05M_ORB.SP3"


The EXP I am not sure of - perhaps because there is no further info in the filename, it assumes an experimental file?

ronaldmaj · 2024-08-02T12:22:20Z

tests/test_filenames.py

Thanks for putting this together so quickly! It looks good enough to me. We can work out the proper values for the unit-tests in the next iteration of PRs for this

treefern · 2024-08-05T09:07:47Z

Multiple issues identified after merging this. These should have no impact as the relevant code wasn't yet utilised. New PR to follow

NPI-3294 - split out file properties extraction code from filename pr…

c625347

…ediction code, to allow it to also be used in file checks

treefern requested a review from ronaldmaj May 7, 2024 05:35

treefern self-assigned this May 7, 2024

ronaldmaj previously approved these changes May 7, 2024

View reviewed changes

NPI-3294 - updated naming based on PR feedback

98cf0c3

jashlearn reviewed May 7, 2024

View reviewed changes

NPI-3294 - formatting improvements

71cc01a

jashlearn reviewed May 7, 2024

View reviewed changes

NPI-3294 - switched validation warnings to errors, as they represent …

e4f1295

…format spec violations

treefern dismissed ronaldmaj’s stale review via e4f1295 May 7, 2024 06:39

treefern added 4 commits May 7, 2024 06:48

NPI-3294 - added docstrings per PR suggestion

92efba9

NPI-3294 - improved optional argument handling in response to PR sugg…

9858c31

…estion

NPI-3294 - removed helper functions better suited to more context spe…

a99dcb3

…cific code.

NPI-3294 - added more generic file name vs contents checker function

f6f0ebd

treefern changed the title ~~NPI-3294 - Minor restructure of file properties extraction to allow broader reuse~~ NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality May 7, 2024

treefern added 2 commits May 8, 2024 05:38

NPI-3294 - streamlined filename vs contents discrepancy check functio…

41ffaf5

…n based on PR feedback

NPI-3294 - add support for fetching samplingrate in seconds from file…

59b7231

… properties function. Make sampling rate calculation consistent by disabling duplicated logic on one code path.

Merge branch 'main' into NPI-3294-split-out-file-properties-extraction

cb5346e

seballgeyer reviewed Jul 31, 2024

View reviewed changes

treefern added 4 commits July 31, 2024 06:08

NPI-3294 improvements to prototype filename and contents consistency …

8b1ce8b

…check function, to allow it to account for files with one epoch less than a full timespan worth of data.

NPI-3294 tidied up imports as suggested. Fixed discrepant timespan ch…

6b35560

…eck to use sampling_rate_seconds not the string representation

NPI-3294 reorganised and renamed variables, updated comments to impro…

07f8bdc

…ve clarity

NPI-3294 updated names of key functions for clarity and brevity

f2fb3d0

ronaldmaj requested changes Aug 2, 2024

View reviewed changes

treefern added 4 commits August 2, 2024 09:27

Merge branch 'main' into NPI-3294-split-out-file-properties-extraction

66d3737

NPI-3294 small bugfix identified in unit test development (yay)

0e26d85

NPI-3294 broke out unit test bulk test data into a separate file to b…

d3e5317

…e imported anywhere it is needed.

NPI-3294 initial tests for functions that determine file properties f…

636225d

…rom contents or name. A bit more work could be done here

ronaldmaj approved these changes Aug 2, 2024

View reviewed changes

ronaldmaj merged commit ff25255 into main Aug 2, 2024
1 check passed

ronaldmaj deleted the NPI-3294-split-out-file-properties-extraction branch August 2, 2024 12:28

treefern mentioned this pull request Aug 5, 2024

Npi 3294 Fixes and better warnings in filename based property extraction, and filename vs contents checks #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21

NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21

treefern commented May 7, 2024

ronaldmaj May 7, 2024

ronaldmaj May 7, 2024

ronaldmaj May 7, 2024

jashlearn May 7, 2024

treefern May 7, 2024

jashlearn May 7, 2024

treefern May 7, 2024

treefern Jul 31, 2024

ronaldmaj May 7, 2024

jashlearn May 7, 2024

treefern May 7, 2024

jashlearn May 7, 2024

treefern May 7, 2024

ronaldmaj commented Jul 30, 2024

seballgeyer Jul 30, 2024

ronaldmaj left a comment

treefern commented Aug 2, 2024

ronaldmaj left a comment

ronaldmaj Aug 2, 2024

treefern Aug 5, 2024

ronaldmaj Aug 2, 2024

ronaldmaj Aug 2, 2024

treefern commented Aug 5, 2024

		return generate_IGS_long_filename(**name_properties)


		def determine_file_props(file_path: pathlib.Path, defaults: Dict[str, Any], overrides: Dict[str, Any]) -> Dict[str, Any]:

		@@ -722,6 +753,35 @@ def determine_name_props_from_filename(filename: str) -> Dict[str, Any]:
		}


		def warn_on_unexpected_filename(input_file: pathlib.Path) -> bool:



		def check_file_timespan_as_claimed(input_file: pathlib.Path) -> bool:
		try:


		derived_filename = filenames.determine_file_name(test_sp3_file)

		# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd.

	# Computed at time of wrting. Seems valid, but FIL and EXP are a bit odd.
	# Computed at time of writing. Note that Analysis Centre is derived from the first three chars
	# of the filename, which is why we ended up with `FIL` here.

NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21

NPI-3294 - Updates to file properties extraction to allow broader use, addition of file name vs file contents check functionality #21

Conversation

treefern commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronaldmaj commented Jul 30, 2024

Choose a reason for hiding this comment

ronaldmaj left a comment

Choose a reason for hiding this comment

treefern commented Aug 2, 2024

ronaldmaj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

treefern commented Aug 5, 2024