Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDF Updates #660

Merged
merged 65 commits into from
Jul 1, 2024
Merged

Conversation

anamanica
Copy link
Contributor

@anamanica anamanica commented Jun 25, 2024

Change Summary

Overview

Cleaning up the CdfAttributeManager class code in order to ensure schema validation when a user is accessing global and variable attributes. This is done by changing .global_attributes and .variable_attributes to be private properties, establishing the methods .get_global_attributes and .get_variable_attributes as the way to retrieve a validated, specific dictionary.

This particular change has been slightly debated. Originally, the methods .get_global_attriubtes and .get_variable_attributes were going to be changed into Python properties, but that proved to be unintuitive with the functionality of getters and setters. Then, we were going to create a new class to place validation into the __getitem__ functionality of dictionaries, but this would require creating a class with multiple inheritance and doing some funky things. Overall, it was decided by Maxine and I that the best current solution is the one described above.

This branch was created from cdf-attribute-tests in order to carry the testing work from that branch onto this one.

New Dependencies

None

New Files

None

Deleted Files

None

Updated Files

-cdf_attribute_manger
- Changing .global_attributes and .variable_attributes to private methods.
- Finishing the code of the method .get_variable_attributes.

  • ultra_l1_utils.py
    • Changing lines 37 and 51 to use .get_variable_attributes instead of .variable_attributes
  • codice_l1b.py
    • Changing line 66 to use .get_variable_attributes instead of .variable_attributes
  • codice/utils.py
    • Changing line 158 to use .get_variable_attributes instead of .variable_attributes

Testing

All covered in test_cdf_attribute_manager.py.

closes #593

Copy link
Contributor

@maxinelasp maxinelasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes, but looking good so far!

imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
if variable_name in self._variable_attributes:
output = self._variable_attributes[variable_name]
elif (
variable_name is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think variable_name should ever be None, so this check is not needed. Actually, this whole elif block should be removed, because the person calling get_variable_attributes should always provide variable_name.

This was necessary in get_global_attributes because the instrument_id is an optional parameter, so it can be none.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is why your tests are failing too!

):
output = self._variable_attributes["variable_name"][attr_name]
elif (
variable_name not in self.variable_attribute_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than checking variable_name against the schema, we should check the attributes against the schema. So you would rewrite this block as something like:

for attribute in self.variable_attribute_schema.keys():
  if attribute in self._variable_attributes[variable_name]:
    add to output
  if attribute not in self._variable_attributes[variable_name] and variable is required:
    add to output as None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahah, yes, thank you so much! I caught this momentarily after I submitted the PR request. Felt very silly. I should have probably started this one as a draft.

@@ -11,11 +11,28 @@


class ImapCdfAttributes(CdfAttributeManager):
"""Contains IMAP specific tools and settings for CDF management."""
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR includes some stuff from your test PR - that will go away once you merge the test PR.

Copy link
Contributor

@subagonsouth subagonsouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a ton of work and things are looking pretty good. I agree with the decision to remove the property accessors.

Let me know if you need any clarification on the comments that I left.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on the __init__.py file but this is relevant to all files in this directory...

All .py files in this directory should be moved to imap_processing/tests/cdf/
All .yaml files should be moved under the parent directory imap_processing/tests/cdf/test_data

Basically, we want the directory structure under imap_processing/tests to mirror the directory structure of under imap_processing with the exception of test data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! I had to leave one .yaml file under /tests/cdf due to the way the code is written in __init__, but I will be sure to ask Maxine about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we left the tests where they were, is because this code is going to be moved to another repo very soon and hopefully removed from here. So, I wanted it to be isolated from everything else.

but, in general, all tests should go under imap_processing/tests. So I don't feel strongly about this, but I'd slightly prefer to keep it all together to make it easier to move and feel confident I got all of it.

imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest reducing this test schema file to have the minimum number of entries needed for complete test coverage. I think that you can get away with as few as two entries under attribute_key. They can be completely made up entries with values specified such that you can test everything that you need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Initially I had made a default_test_schema file, but then was asked to use the actual schema file since that is a bit more concrete that the individual instrument files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see where some of the confusion is coming from. Either you should use the test_schema file (if there is some testing reason to keep it) and point to it in the tests (so change the code in Tim's other comment) OR continue using the real schema, leave the tests the same, and remove these unused schemas.

If I'm missing something here, please let me know!

imap_processing/cdf/tests/test_cdf_attribute_manager.py Outdated Show resolved Hide resolved
"""

# Initialize CdfAttributeManager object which loads in default info
cdf_manager = CdfAttributeManager(Path(__file__).parent.parent / "config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to first load the default (not for testing) schema and default global attributes? It makes this test a bit confusing to have that cross pollination that isn't very obvious. Can the test schema be used instead so that this is totally isolated to using test specific schema and global attribute defaults?

This comment applies to all of the test functions in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the reason we are using the default schema for these tests is because they are actually loaded into the CdfAttributeManager class from the __init__ function. I would have to change the code here if I wanted to use test schema, which I have been told is not ideal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use the real schema against the test data so we can avoid the real schema and test schema from getting out of sync, so I asked Ana to do it this way.

But, since you got a comment on it, can you add a short comment in the code explaining why you're loading in the default schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxinelasp, Does this mean that you are considering these tests a validation of the real schema? I think of unit tests as only covering the code implementation, not validating the schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. I can see either argument, but in this case, I think it makes sense to have the real schema be part of the unit tests because it's closely tied to the code. But, I can definitely see the philosophical argument that it's not really part of the "unit" so it shouldn't be tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job implementing succinct tests in this file that cover only the functionality added by the ImapCdfAttributes class!

Copy link
Contributor

@maxinelasp maxinelasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments but it's definitely getting close!!

imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
imap_processing/cdf/cdf_attribute_manager.py Show resolved Hide resolved
imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
# Case to handle DEPEND_i schema issues
elif attr_name == "DEPEND_i":
# range(3) because the highest DEPEND_i value is 3.
for i in range(3):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume that there are only 3 DEPEND_i cases. To address your todo comment below, I also believe depend_0 does need to exist and it needs to be epoch.

Suggested rewrite:

elif attr_name == "DEPEND_i":
  # Find all the attributes of variable_name that contain "DEPEND"
  variable_depend_attrs = [key for key in self._variable_attributes[variable_name].keys() if "DEPEND" in key] 
  # Confirm that each DEPEND_i attribute is unique
  if len(set(variable_depend_attrs)) != len(variable_depend_attrs):
    logger.warn(f"Found duplicate DEPEND_i attribute in variable {variable_name}: {variable_depend_attrs}")
  for variable_depend_attr in variable_depend_attrs: 
    output[variable_depend_attr] = self._variable_attributes[variable_name][variable_depend_attr]

Also as a TODO, please add a comment to add some additional validation for the depend attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! This works way better. Much more sophisticated. Thanks!

Regarding DEFAULT_0, I actually talked with Sean about his tests for l0_l1a, and that is were he told me that DEPEND_0 does not need to be present in some cases he is working with when epoch is not one of the dimensions. I am doing a poor job of explaining this, so I can send you the message he sent me if you would like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, for posterity: according to Bryan Harter, Depend_0 must be an epoch, but it isn't actually required for every variable. So we will be updating the schema to make it not required and removing the special cases from the code.

imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
imap_processing/cdf/cdf_attribute_manager.py Outdated Show resolved Hide resolved
imap_processing/hi/l1c/hi_l1c.py Outdated Show resolved Hide resolved
Copy link
Contributor

@subagonsouth subagonsouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persevering through this challenging PR. Sorry that my off-nominal use of the CdfAttributeManager caused problems. I'm happy with this end result.

imap_processing/hi/l1a/science_direct_event.py Outdated Show resolved Hide resolved
@@ -21,6 +23,7 @@ def test_hi_l1b_hk():
assert l1b_dataset.attrs["Logical_source"] == "imap_hi_l1b_45sensor-hk"


@pytest.mark.xfail()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write myself a ticket to fix this test.

Copy link
Contributor

@maxinelasp maxinelasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

@anamanica anamanica merged commit a9debee into IMAP-Science-Operations-Center:dev Jul 1, 2024
17 checks passed
@anamanica anamanica deleted the cdf-updates branch July 1, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDF Attribute manager: Move methods to properties
3 participants