-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDF Updates #660
CDF Updates #660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes, but looking good so far!
if variable_name in self._variable_attributes: | ||
output = self._variable_attributes[variable_name] | ||
elif ( | ||
variable_name is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think variable_name
should ever be None, so this check is not needed. Actually, this whole elif
block should be removed, because the person calling get_variable_attributes should always provide variable_name.
This was necessary in get_global_attributes because the instrument_id is an optional parameter, so it can be none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is why your tests are failing too!
): | ||
output = self._variable_attributes["variable_name"][attr_name] | ||
elif ( | ||
variable_name not in self.variable_attribute_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than checking variable_name
against the schema, we should check the attributes against the schema. So you would rewrite this block as something like:
for attribute in self.variable_attribute_schema.keys():
if attribute in self._variable_attributes[variable_name]:
add to output
if attribute not in self._variable_attributes[variable_name] and variable is required:
add to output as None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahah, yes, thank you so much! I caught this momentarily after I submitted the PR request. Felt very silly. I should have probably started this one as a draft.
@@ -11,11 +11,28 @@ | |||
|
|||
|
|||
class ImapCdfAttributes(CdfAttributeManager): | |||
"""Contains IMAP specific tools and settings for CDF management.""" | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this PR includes some stuff from your test PR - that will go away once you merge the test PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a ton of work and things are looking pretty good. I agree with the decision to remove the property accessors.
Let me know if you need any clarification on the comments that I left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting on the __init__.py
file but this is relevant to all files in this directory...
All .py
files in this directory should be moved to imap_processing/tests/cdf/
All .yaml
files should be moved under the parent directory imap_processing/tests/cdf/test_data
Basically, we want the directory structure under imap_processing/tests
to mirror the directory structure of under imap_processing
with the exception of test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! I had to leave one .yaml file under /tests/cdf
due to the way the code is written in __init__
, but I will be sure to ask Maxine about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we left the tests where they were, is because this code is going to be moved to another repo very soon and hopefully removed from here. So, I wanted it to be isolated from everything else.
but, in general, all tests should go under imap_processing/tests
. So I don't feel strongly about this, but I'd slightly prefer to keep it all together to make it easier to move and feel confident I got all of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest reducing this test schema file to have the minimum number of entries needed for complete test coverage. I think that you can get away with as few as two entries under attribute_key
. They can be completely made up entries with values specified such that you can test everything that you need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Initially I had made a default_test_schema file, but then was asked to use the actual schema file since that is a bit more concrete that the individual instrument files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see where some of the confusion is coming from. Either you should use the test_schema file (if there is some testing reason to keep it) and point to it in the tests (so change the code in Tim's other comment) OR continue using the real schema, leave the tests the same, and remove these unused schemas.
If I'm missing something here, please let me know!
imap_processing/cdf/tests/shared/default_global_cdf_attrs_schema.yaml
Outdated
Show resolved
Hide resolved
""" | ||
|
||
# Initialize CdfAttributeManager object which loads in default info | ||
cdf_manager = CdfAttributeManager(Path(__file__).parent.parent / "config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to first load the default (not for testing) schema and default global attributes? It makes this test a bit confusing to have that cross pollination that isn't very obvious. Can the test schema be used instead so that this is totally isolated to using test specific schema and global attribute defaults?
This comment applies to all of the test functions in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the reason we are using the default schema for these tests is because they are actually loaded into the CdfAttributeManager class from the __init__
function. I would have to change the code here if I wanted to use test schema, which I have been told is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use the real schema against the test data so we can avoid the real schema and test schema from getting out of sync, so I asked Ana to do it this way.
But, since you got a comment on it, can you add a short comment in the code explaining why you're loading in the default schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxinelasp, Does this mean that you are considering these tests a validation of the real schema? I think of unit tests as only covering the code implementation, not validating the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good question. I can see either argument, but in this case, I think it makes sense to have the real schema be part of the unit tests because it's closely tied to the code. But, I can definitely see the philosophical argument that it's not really part of the "unit" so it shouldn't be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job implementing succinct tests in this file that cover only the functionality added by the ImapCdfAttributes
class!
…processing into cdf-updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments but it's definitely getting close!!
# Case to handle DEPEND_i schema issues | ||
elif attr_name == "DEPEND_i": | ||
# range(3) because the highest DEPEND_i value is 3. | ||
for i in range(3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't assume that there are only 3 DEPEND_i cases. To address your todo comment below, I also believe depend_0 does need to exist and it needs to be epoch
.
Suggested rewrite:
elif attr_name == "DEPEND_i":
# Find all the attributes of variable_name that contain "DEPEND"
variable_depend_attrs = [key for key in self._variable_attributes[variable_name].keys() if "DEPEND" in key]
# Confirm that each DEPEND_i attribute is unique
if len(set(variable_depend_attrs)) != len(variable_depend_attrs):
logger.warn(f"Found duplicate DEPEND_i attribute in variable {variable_name}: {variable_depend_attrs}")
for variable_depend_attr in variable_depend_attrs:
output[variable_depend_attr] = self._variable_attributes[variable_name][variable_depend_attr]
Also as a TODO, please add a comment to add some additional validation for the depend attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! This works way better. Much more sophisticated. Thanks!
Regarding DEFAULT_0, I actually talked with Sean about his tests for l0_l1a, and that is were he told me that DEPEND_0 does not need to be present in some cases he is working with when epoch is not one of the dimensions. I am doing a poor job of explaining this, so I can send you the message he sent me if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, for posterity: according to Bryan Harter, Depend_0 must be an epoch, but it isn't actually required for every variable. So we will be updating the schema to make it not required and removing the special cases from the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for persevering through this challenging PR. Sorry that my off-nominal use of the CdfAttributeManager caused problems. I'm happy with this end result.
@@ -21,6 +23,7 @@ def test_hi_l1b_hk(): | |||
assert l1b_dataset.attrs["Logical_source"] == "imap_hi_l1b_45sensor-hk" | |||
|
|||
|
|||
@pytest.mark.xfail() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write myself a ticket to fix this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
…processing into cdf-updates Updating branch
a9debee
into
IMAP-Science-Operations-Center:dev
Change Summary
Overview
Cleaning up the CdfAttributeManager class code in order to ensure schema validation when a user is accessing global and variable attributes. This is done by changing
.global_attributes
and.variable_attributes
to be private properties, establishing the methods.get_global_attributes
and.get_variable_attributes
as the way to retrieve a validated, specific dictionary.This particular change has been slightly debated. Originally, the methods
.get_global_attriubtes
and.get_variable_attributes
were going to be changed into Python properties, but that proved to be unintuitive with the functionality of getters and setters. Then, we were going to create a new class to place validation into the__getitem__
functionality of dictionaries, but this would require creating a class with multiple inheritance and doing some funky things. Overall, it was decided by Maxine and I that the best current solution is the one described above.This branch was created from cdf-attribute-tests in order to carry the testing work from that branch onto this one.
New Dependencies
None
New Files
None
Deleted Files
None
Updated Files
-cdf_attribute_manger
- Changing
.global_attributes
and.variable_attributes
to private methods.- Finishing the code of the method
.get_variable_attributes
..get_variable_attributes
instead of.variable_attributes
.get_variable_attributes
instead of.variable_attributes
.get_variable_attributes
instead of.variable_attributes
Testing
All covered in test_cdf_attribute_manager.py.
closes #593