Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Telem Def Nans #161

Merged
merged 4 commits into from
Sep 20, 2023
Merged

Conversation

sdhoyt
Copy link
Contributor

@sdhoyt sdhoyt commented Sep 14, 2023

Change Summary

Overview

I was having an issue with Lo and HIT where there were nan values (floating point type nans) in the short/long descriptions that were causing an error when the xml tried to write. I added a couple lines to replace those with empty strings.

New Dependencies

None

New Files

None

Deleted Files

None

Updated Files

  • telemetry generator.py
    • forced short/long description nans to strings and set as empty strings to avoid write errors

Testing

Tested by creating HIT and Lo XTCE

@sdhoyt sdhoyt requested a review from a team September 14, 2023 22:13
@sdhoyt sdhoyt self-assigned this Sep 14, 2023
@sdhoyt sdhoyt requested review from bourque, greglucas, tech3371, bryan-harter, laspsandoval, GFMoraga and maxinelasp and removed request for a team September 14, 2023 22:13
@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set):

description = Et.SubElement(parameter, "xtce:LongDescription")

description.text = (
row.get("shortDescription") or row.get("longDescription") or ""
# handle nan floating point values and replace with empty strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know short_description and long_description has been causing issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean using those variable names were causing issues before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have is good. I mean to say thank you for adding these checks. I thought mine will take care this issue but looks like it didn't. I should have tested better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. No problem! Hopefully these changes don't cause issues for anyone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually an issue upstream of here, so it actually needs the fix to be in the parser somewhere? I would have expected row.get("shortDescription") to always return a string, and never a floating point NaN, so I'm kind of curious if there is something more fundamental about this happening that might cause issues in other places too.

Copy link
Contributor Author

@sdhoyt sdhoyt Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, I can do the .astype(str) and check for a "nan" string (I didn't see that comment when I responded previously). It's not an entirely empty column though. It parses it as type object (code below). I can try just handling NaN and see if I get any issues generating xtce for SWE or CODICE, but we'll still need to check for both if others are parsing as None for whatever reason.

xls = pd.ExcelFile("/Users/seho5886/Desktop/IMAP/hit/TLM_HIT_v20220524.xls")
df = xls.parse("P_HIT_HSKP", converters={"longDescription":str})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   packetName        80 non-null     object
 1   mnemonic          80 non-null     object
 2   sequence          80 non-null     int64 
 3   lengthInBits      80 non-null     int64 
 4   startBit          80 non-null     int64 
 5   dataType          80 non-null     object
 6   convertAs         80 non-null     object
 7   units             80 non-null     object
 8   source            69 non-null     object
 9   shortDescription  78 non-null     object
 10  longDescription   8 non-null      object
dtypes: int64(3), object(8)
memory usage: 7.0+ KB
df.longDescription.isnull()
0     False
1     False
2     False
3     False
4     False
      ...  
75     True
76     True
77     True
78     True
79     True
Name: longDescription, Length: 80, dtype: bool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, interesting that HIT is the only instrument with blank cells to cause this! This definitely seems like something others may produce in the future as well...

What about adding a converter like this? https://stackoverflow.com/questions/32591466/python-pandas-how-to-specify-data-types-when-reading-an-excel-file

May be splitting hairs here though, so your current solution is good, it just feels like there should be a way to push this upstream a bit further as well.

Copy link
Contributor Author

@sdhoyt sdhoyt Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out what's going on. An empty cell in a column is parsed as NaN, but if row.get() tries to access data from a column that doesn't exist it returns None. SWE doesn't have a shortDescription column which is where the None was coming from before. Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] ) and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense now, nice investigation!

Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] ) and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.

I don't think we want to force people to add short and long descriptions if they aren't in the spreadsheets, that sounds like a lot of headache like you say. Yet another option to consider would be to return np.nan from the get lookup fallback:

if not np.isnan(row.get("longDescription", np.nan)):
    ...

So then we don't have to check for both nan and None. Also see my comment below about adding both short and long descriptions as two separate if clauses, which might simplify some of the logic here as well if you want to implement that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Thanks! I also just found out that pd.isna() / pd.notna() handles NaN and None the same.

tools/xtce_generation/telemetry_generator.py Outdated Show resolved Hide resolved
@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set):

description = Et.SubElement(parameter, "xtce:LongDescription")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GFMoraga and @tech3371 Do we want to put the short description into the long description tag?

Looking at this and searching through it looks like there is both short and long description supported. https://public.ccsds.org/Pubs/660x2g2.pdf

So, would it make sense to do something like the following so you could have both short and/or long descriptions supported?

Suggested change
description = Et.SubElement(parameter, "xtce:LongDescription")
if row.get("shortDescription"):
description = Et.SubElement(parameter, "xtce:ShortDescription")
description.text = row.get("shortDescription")
if row.get("longDescription"):
description = Et.SubElement(parameter, "xtce:LongDescription")
description.text = row.get("longDescription")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sdhoyt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea. I know most will have BOTH, but not all will have the descriptions. As long as the this resolves the NaN error in the descriptions that are blank. I will say that the xtce.xml file won't look the best with both, and we need to decide if both are required...?

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice simplification now to handle a lot of the cases. But note, that I think it will change people's XTCEs that were already generated because it may include additional description tags. So we should verify that @tech3371 and @GFMoraga are getting good outputs with this version too.

@greglucas greglucas requested a review from tech3371 September 19, 2023 17:23
Copy link
Contributor

@GFMoraga GFMoraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a comment, but it looks great. Pandas is supported, so it should be fine. If there are kinks, we can debug it again.

@sdhoyt sdhoyt merged commit 0d3271c into IMAP-Science-Operations-Center:dev Sep 20, 2023
12 checks passed
@sdhoyt sdhoyt deleted the xtce-nans branch September 20, 2023 19:11
maxinelasp pushed a commit to maxinelasp/imap_processing that referenced this pull request Sep 29, 2023
* added generators for hit and lo, updated telem generator

* removed hit and lo files

* better None/NAN handling, using both short and long description tags
maxinelasp pushed a commit to maxinelasp/imap_processing that referenced this pull request Oct 2, 2023
* added generators for hit and lo, updated telem generator

* removed hit and lo files

* better None/NAN handling, using both short and long description tags
laspsandoval pushed a commit to laspsandoval/imap_processing that referenced this pull request Nov 15, 2023
* added generators for hit and lo, updated telem generator

* removed hit and lo files

* better None/NAN handling, using both short and long description tags
@bourque
Copy link
Collaborator

bourque commented Jan 31, 2024

@all-contributors please add @sdhoyt for maintenance

Copy link
Contributor

@bourque

I've put up a pull request to add @sdhoyt! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants