Skip to content

Conversation

@maxinelasp
Copy link
Contributor

Change Summary

Fixes #2297 along with another bug introduced in #2378. Also completes the attributes for L1D.

Overview

  • Add new attributes for L1D - which are very similar to the ones for L2
  • Fix the ancillary files so they do not upload through the write_cdf and upload flow
  • Fix the ancillary files so errors do not interrupt science processing upload

Testing

Added a test for the bugs that I saw causing this issue

@maxinelasp maxinelasp self-assigned this Nov 18, 2025
@maxinelasp maxinelasp requested review from lacoak21 and subagonsouth and removed request for a team November 18, 2025 00:36
@maxinelasp maxinelasp added the Ins: MAG Related to the MAG instrument label Nov 18, 2025
@maxinelasp maxinelasp added this to IMAP Nov 18, 2025
version = self.version

output_filename = (
imap_data_access.AncillaryFilePath.generate_from_inputs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than recreating this through the subclass are you able to put this logic into the write_cdf() function instead? It seems like maybe the only difference is in the filenaming convention here, so you could put a cut-out for that in the write_cdf() I think where you check for mag l1d in there. This just reads as a lot of duplicate code and potential for confusion by calling super in a few different paths here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought, but it would require skipping a lot of write_cdf, and most of this code would still exist because it actually is pretty different than what write_cdf does. write_cdf is reading these values from the attributes of the dataset rather than from the class definition, so write_cdf doesn't even have access to the values I use here. So, moving this into write_cdf would require assigning these attributes into the dataset anyway (which is about as much code) and then adding another special case into write_cdf which skips most of it. I didn't want to clutter up shared code with the special case when it's not needed.

You have a good point about calling super(), I can move all the calls to the bottom so we have this cut out which pulls out the ancillary files and writes them, but then in most cases the call just routes through the parent classes function like normal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only code that actually matches with write_cdf is the split() call on the Logical_source and the xarray_to_cdf call at the end (where I pass in different kwargs.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, moving this into write_cdf would require assigning these attributes into the dataset anyway (which is about as much code) and then adding another special case into write_cdf which skips most of it. I didn't want to clutter up shared code with the special case when it's not needed.

This sounds like it might be nice anyways to add those attributes? I agree though, if it isn't a one/two-line cut-out for ancillary files then it probably doesn't make sense to put it into the write_cdf() function. I thought it just looked like it would only be this one naming line because they are cdf files as well.

try:
self.upload_products(ancillary_files)
except Exception as e:
logger.warning(f"Failed to upload ancillary products due to error {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are output products don't we think they should be done with the same care as science products? I'm not sure why these would fail when a science product wouldn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they are currently failing and blocking science product release. Since these are internal watchdog products, I do think they are a lower priority than the science files... but I can remove the excepts if that doesn't line up with the rest of the SDC or what MAG wants.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious if there is a reason they are failing. Can we identify this earlier in the process somehow and not include these in the list of data products to upload if there is bad content. i.e. the upload isn't what should fail and if it does we should fix that at the source.

Copy link
Contributor Author

@maxinelasp maxinelasp Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what the test I added is checking! But, partly the problem is that they are trying to get uploaded as science files and they aren't. So, I don't anticipate this happening again, but if it does, I was thinking it would be preferable to NOT block science products. However, this does reduce visibility into these products if they start failing so I understand your concern.
Really, these checks aren't for bad content, they're for A. is the filename constructed properly and B. is the ancillary file upload functioning properly.

But I am definitely open to removing them if the increased stability isn't worth the risk of missing errors from the SDC perspective.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the post_processing() method there is a callout for Path vs Dataset difference in the list of products. So my personal preference would be to write the ancillary products here as you have (logging/ignoring errors if there are any if you want in that portion), then call super().post_process() with the mixed list of all science datasets and ancillary filepaths, and let the uploads happen there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's a good catch that makes this cleaner as well. I'll make that change.

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit on preferring a mixed list[Dataset, Path] to try and re-use the upload capability, but not blocking on that and looks good to me otherwise so you can take/leave that comment as you see fit.

try:
self.upload_products(ancillary_files)
except Exception as e:
logger.warning(f"Failed to upload ancillary products due to error {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the post_processing() method there is a callout for Path vs Dataset difference in the list of products. So my personal preference would be to write the ancillary products here as you have (logging/ignoring errors if there are any if you want in that portion), then call super().post_process() with the mixed list of all science datasets and ancillary filepaths, and let the uploads happen there.

Copy link
Contributor

@alastairtree alastairtree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Maxine! Added a couple of small inline bits and looking forward to a working L1D!


vector_attrs_rtn:
<<: *vectors_default
CATDESC: Magnetic field vectors with x y z varying by time in Radial-Tangential-Normal Reference Frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhairifin can you confirm if these are correct for me? Would not label these as r t and n not xyz?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, RTN specifically should be labelled as r, t, and n

self,
attribute_manager: ImapCdfAttributes,
day: np.datetime64,
data_level: str = "l1d",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a field on MagL2L1dBase that is overridden in each subclass - I cant see why you want to be able to do MagL1d(..).generate_dataset(attributes, day, "some-other-level")?

# Call parent generate_dataset method
dataset = super().generate_dataset(attribute_manager, day)
# Call parent generate_dataset method with L1D data level
dataset = super().generate_dataset(attribute_manager, day, data_level="l1d")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need the default value if the default is set by the method definition? Or as I said above move it to a field/property.

mock_dependencies = ProcessingInputCollection()

with patch("imap_processing.cdf.utils.xarray_to_cdf"):
mag_processor.post_processing(l1d_datasets, mock_dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any value in tests for the the burst files? Parameterised test? Similar for L2.

)
# update the dataset in processed_data to point to a path
processed_data[index] = output_filepath
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you catch anything more specific? I don't think you want to catch and continue on a MemoryError for example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ins: MAG Related to the MAG instrument

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

BUG - MAG L1D jobs always fail

4 participants