Handle Telem Def Nans #161

sdhoyt · 2023-09-14T22:13:46Z

Change Summary

Overview

I was having an issue with Lo and HIT where there were nan values (floating point type nans) in the short/long descriptions that were causing an error when the xml tried to write. I added a couple lines to replace those with empty strings.

New Dependencies

None

New Files

None

Deleted Files

None

Updated Files

telemetry generator.py
- forced short/long description nans to strings and set as empty strings to avoid write errors

Testing

Tested by creating HIT and Lo XTCE

tech3371 · 2023-09-14T22:25:57Z

tools/xtce_generation/telemetry_generator.py

@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set):

            description = Et.SubElement(parameter, "xtce:LongDescription")

-            description.text = (
-                row.get("shortDescription") or row.get("longDescription") or ""
+            # handle nan floating point values and replace with empty strings


I know short_description and long_description has been causing issues.

You mean using those variable names were causing issues before?

What you have is good. I mean to say thank you for adding these checks. I thought mine will take care this issue but looks like it didn't. I should have tested better.

Ah, gotcha. No problem! Hopefully these changes don't cause issues for anyone.

Is this actually an issue upstream of here, so it actually needs the fix to be in the parser somewhere? I would have expected row.get("shortDescription") to always return a string, and never a floating point NaN, so I'm kind of curious if there is something more fundamental about this happening that might cause issues in other places too.

And yeah, I can do the .astype(str) and check for a "nan" string (I didn't see that comment when I responded previously). It's not an entirely empty column though. It parses it as type object (code below). I can try just handling NaN and see if I get any issues generating xtce for SWE or CODICE, but we'll still need to check for both if others are parsing as None for whatever reason.

xls = pd.ExcelFile("/Users/seho5886/Desktop/IMAP/hit/TLM_HIT_v20220524.xls")

df = xls.parse("P_HIT_HSKP", converters={"longDescription":str})

df.info()

<class 'pandas.core.frame.DataFrame'> RangeIndex: 80 entries, 0 to 79 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 packetName 80 non-null object 1 mnemonic 80 non-null object 2 sequence 80 non-null int64 3 lengthInBits 80 non-null int64 4 startBit 80 non-null int64 5 dataType 80 non-null object 6 convertAs 80 non-null object 7 units 80 non-null object 8 source 69 non-null object 9 shortDescription 78 non-null object 10 longDescription 8 non-null object dtypes: int64(3), object(8) memory usage: 7.0+ KB

df.longDescription.isnull()

0 False 1 False 2 False 3 False 4 False ... 75 True 76 True 77 True 78 True 79 True Name: longDescription, Length: 80, dtype: bool

OK, interesting that HIT is the only instrument with blank cells to cause this! This definitely seems like something others may produce in the future as well...

What about adding a converter like this? https://stackoverflow.com/questions/32591466/python-pandas-how-to-specify-data-types-when-reading-an-excel-file

May be splitting hairs here though, so your current solution is good, it just feels like there should be a way to push this upstream a bit further as well.

I figured out what's going on. An empty cell in a column is parsed as NaN, but if row.get() tries to access data from a column that doesn't exist it returns None. SWE doesn't have a shortDescription column which is where the None was coming from before. Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] ) and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.

That makes sense now, nice investigation!

Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] ) and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.

I don't think we want to force people to add short and long descriptions if they aren't in the spreadsheets, that sounds like a lot of headache like you say. Yet another option to consider would be to return np.nan from the get lookup fallback:

if not np.isnan(row.get("longDescription", np.nan)): ...

So then we don't have to check for both nan and None. Also see my comment below about adding both short and long descriptions as two separate if clauses, which might simplify some of the logic here as well if you want to implement that.

Good idea! Thanks! I also just found out that pd.isna() / pd.notna() handles NaN and None the same.

tools/xtce_generation/telemetry_generator.py

greglucas · 2023-09-18T20:17:53Z

tools/xtce_generation/telemetry_generator.py

@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set):

            description = Et.SubElement(parameter, "xtce:LongDescription")


@GFMoraga and @tech3371 Do we want to put the short description into the long description tag?

Looking at this and searching through it looks like there is both short and long description supported. https://public.ccsds.org/Pubs/660x2g2.pdf

So, would it make sense to do something like the following so you could have both short and/or long descriptions supported?

Suggested change

description = Et.SubElement(parameter, "xtce:LongDescription")

if row.get("shortDescription"):

description = Et.SubElement(parameter, "xtce:ShortDescription")

description.text = row.get("shortDescription")

if row.get("longDescription"):

description = Et.SubElement(parameter, "xtce:LongDescription")

description.text = row.get("longDescription")

I think this is a great idea. I know most will have BOTH, but not all will have the descriptions. As long as the this resolves the NaN error in the descriptions that are blank. I will say that the xtce.xml file won't look the best with both, and we need to decide if both are required...?

greglucas

This seems like a nice simplification now to handle a lot of the cases. But note, that I think it will change people's XTCEs that were already generated because it may include additional description tags. So we should verify that @tech3371 and @GFMoraga are getting good outputs with this version too.

GFMoraga

I made a comment, but it looks great. Pandas is supported, so it should be fine. If there are kinks, we can debug it again.

* added generators for hit and lo, updated telem generator * removed hit and lo files * better None/NAN handling, using both short and long description tags

bourque · 2024-01-31T16:55:02Z

@all-contributors please add @sdhoyt for maintenance

allcontributors · 2024-01-31T16:55:11Z

@bourque

I've put up a pull request to add @sdhoyt! 🎉

sdhoyt added 3 commits September 14, 2023 15:50

added generators for hit and lo, updated telem generator

e16d4b5

updated generator after new merge

97f5a71

removed hit and lo files

4590630

sdhoyt requested a review from a team September 14, 2023 22:13

sdhoyt self-assigned this Sep 14, 2023

sdhoyt requested review from bourque, greglucas, tech3371, bryan-harter, laspsandoval, GFMoraga and maxinelasp and removed request for a team September 14, 2023 22:13

tech3371 reviewed Sep 14, 2023

View reviewed changes

tech3371 approved these changes Sep 14, 2023

View reviewed changes

greglucas reviewed Sep 18, 2023

View reviewed changes

better None/NAN handling, using both short and long description tags

e7cdbca

greglucas approved these changes Sep 19, 2023

View reviewed changes

greglucas requested a review from tech3371 September 19, 2023 17:23

bryan-harter approved these changes Sep 19, 2023

View reviewed changes

GFMoraga approved these changes Sep 19, 2023

View reviewed changes

laspsandoval approved these changes Sep 20, 2023

View reviewed changes

tech3371 approved these changes Sep 20, 2023

View reviewed changes

sdhoyt merged commit 0d3271c into IMAP-Science-Operations-Center:dev Sep 20, 2023
12 checks passed

sdhoyt deleted the xtce-nans branch September 20, 2023 19:11

allcontributors bot mentioned this pull request Jan 31, 2024

docs: add sdhoyt as a contributor for maintenance #323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Telem Def Nans #161

Handle Telem Def Nans #161

sdhoyt commented Sep 14, 2023

tech3371 Sep 14, 2023

sdhoyt Sep 14, 2023

tech3371 Sep 14, 2023

sdhoyt Sep 14, 2023

greglucas Sep 18, 2023

sdhoyt Sep 18, 2023 •

edited

Loading

greglucas Sep 19, 2023

sdhoyt Sep 19, 2023 •

edited

Loading

greglucas Sep 19, 2023

sdhoyt Sep 19, 2023

greglucas Sep 18, 2023

greglucas Sep 19, 2023

GFMoraga Sep 19, 2023

greglucas left a comment

GFMoraga left a comment

bourque commented Jan 31, 2024

allcontributors bot commented Jan 31, 2024

		@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set):

		description = Et.SubElement(parameter, "xtce:LongDescription")

-            description = Et.SubElement(parameter, "xtce:LongDescription")
+            if row.get("shortDescription"):
+                description = Et.SubElement(parameter, "xtce:ShortDescription")
+                description.text = row.get("shortDescription")
+            if row.get("longDescription"):
+                description = Et.SubElement(parameter, "xtce:LongDescription")
+                description.text = row.get("longDescription")

Handle Telem Def Nans #161

Handle Telem Def Nans #161

Conversation

sdhoyt commented Sep 14, 2023

Change Summary

Overview

New Dependencies

New Files

Deleted Files

Updated Files

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdhoyt Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdhoyt Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greglucas left a comment

Choose a reason for hiding this comment

GFMoraga left a comment

Choose a reason for hiding this comment

bourque commented Jan 31, 2024

allcontributors bot commented Jan 31, 2024

sdhoyt Sep 18, 2023 •

edited

Loading

sdhoyt Sep 19, 2023 •

edited

Loading