-
Notifications
You must be signed in to change notification settings - Fork 21
Mag L1D Repair attributes #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Mag L1D Repair attributes #2441
Conversation
imap_processing/cli.py
Outdated
| version = self.version | ||
|
|
||
| output_filename = ( | ||
| imap_data_access.AncillaryFilePath.generate_from_inputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than recreating this through the subclass are you able to put this logic into the write_cdf() function instead? It seems like maybe the only difference is in the filenaming convention here, so you could put a cut-out for that in the write_cdf() I think where you check for mag l1d in there. This just reads as a lot of duplicate code and potential for confusion by calling super in a few different paths here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought, but it would require skipping a lot of write_cdf, and most of this code would still exist because it actually is pretty different than what write_cdf does. write_cdf is reading these values from the attributes of the dataset rather than from the class definition, so write_cdf doesn't even have access to the values I use here. So, moving this into write_cdf would require assigning these attributes into the dataset anyway (which is about as much code) and then adding another special case into write_cdf which skips most of it. I didn't want to clutter up shared code with the special case when it's not needed.
You have a good point about calling super(), I can move all the calls to the bottom so we have this cut out which pulls out the ancillary files and writes them, but then in most cases the call just routes through the parent classes function like normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only code that actually matches with write_cdf is the split() call on the Logical_source and the xarray_to_cdf call at the end (where I pass in different kwargs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, moving this into write_cdf would require assigning these attributes into the dataset anyway (which is about as much code) and then adding another special case into write_cdf which skips most of it. I didn't want to clutter up shared code with the special case when it's not needed.
This sounds like it might be nice anyways to add those attributes? I agree though, if it isn't a one/two-line cut-out for ancillary files then it probably doesn't make sense to put it into the write_cdf() function. I thought it just looked like it would only be this one naming line because they are cdf files as well.
imap_processing/cli.py
Outdated
| try: | ||
| self.upload_products(ancillary_files) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to upload ancillary products due to error {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are output products don't we think they should be done with the same care as science products? I'm not sure why these would fail when a science product wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, they are currently failing and blocking science product release. Since these are internal watchdog products, I do think they are a lower priority than the science files... but I can remove the excepts if that doesn't line up with the rest of the SDC or what MAG wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious if there is a reason they are failing. Can we identify this earlier in the process somehow and not include these in the list of data products to upload if there is bad content. i.e. the upload isn't what should fail and if it does we should fix that at the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what the test I added is checking! But, partly the problem is that they are trying to get uploaded as science files and they aren't. So, I don't anticipate this happening again, but if it does, I was thinking it would be preferable to NOT block science products. However, this does reduce visibility into these products if they start failing so I understand your concern.
Really, these checks aren't for bad content, they're for A. is the filename constructed properly and B. is the ancillary file upload functioning properly.
But I am definitely open to removing them if the increased stability isn't worth the risk of missing errors from the SDC perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the post_processing() method there is a callout for Path vs Dataset difference in the list of products. So my personal preference would be to write the ancillary products here as you have (logging/ignoring errors if there are any if you want in that portion), then call super().post_process() with the mixed list of all science datasets and ancillary filepaths, and let the uploads happen there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's a good catch that makes this cleaner as well. I'll make that change.
Co-authored-by: Greg Lucas <greg.m.lucas@gmail.com>
greglucas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit on preferring a mixed list[Dataset, Path] to try and re-use the upload capability, but not blocking on that and looks good to me otherwise so you can take/leave that comment as you see fit.
imap_processing/cli.py
Outdated
| try: | ||
| self.upload_products(ancillary_files) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to upload ancillary products due to error {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the post_processing() method there is a callout for Path vs Dataset difference in the list of products. So my personal preference would be to write the ancillary products here as you have (logging/ignoring errors if there are any if you want in that portion), then call super().post_process() with the mixed list of all science datasets and ancillary filepaths, and let the uploads happen there.
alastairtree
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Maxine! Added a couple of small inline bits and looking forward to a working L1D!
|
|
||
| vector_attrs_rtn: | ||
| <<: *vectors_default | ||
| CATDESC: Magnetic field vectors with x y z varying by time in Radial-Tangential-Normal Reference Frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhairifin can you confirm if these are correct for me? Would not label these as r t and n not xyz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, RTN specifically should be labelled as r, t, and n
| self, | ||
| attribute_manager: ImapCdfAttributes, | ||
| day: np.datetime64, | ||
| data_level: str = "l1d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a field on MagL2L1dBase that is overridden in each subclass - I cant see why you want to be able to do MagL1d(..).generate_dataset(attributes, day, "some-other-level")?
| # Call parent generate_dataset method | ||
| dataset = super().generate_dataset(attribute_manager, day) | ||
| # Call parent generate_dataset method with L1D data level | ||
| dataset = super().generate_dataset(attribute_manager, day, data_level="l1d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not need the default value if the default is set by the method definition? Or as I said above move it to a field/property.
| mock_dependencies = ProcessingInputCollection() | ||
|
|
||
| with patch("imap_processing.cdf.utils.xarray_to_cdf"): | ||
| mag_processor.post_processing(l1d_datasets, mock_dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any value in tests for the the burst files? Parameterised test? Similar for L2.
| ) | ||
| # update the dataset in processed_data to point to a path | ||
| processed_data[index] = output_filepath | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you catch anything more specific? I don't think you want to catch and continue on a MemoryError for example
Change Summary
Fixes #2297 along with another bug introduced in #2378. Also completes the attributes for L1D.
Overview
Testing
Added a test for the bugs that I saw causing this issue