-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Telem Def Nans #161
Conversation
@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set): | |||
|
|||
description = Et.SubElement(parameter, "xtce:LongDescription") | |||
|
|||
description.text = ( | |||
row.get("shortDescription") or row.get("longDescription") or "" | |||
# handle nan floating point values and replace with empty strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know short_description
and long_description
has been causing issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean using those variable names were causing issues before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you have is good. I mean to say thank you for adding these checks. I thought mine will take care this issue but looks like it didn't. I should have tested better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. No problem! Hopefully these changes don't cause issues for anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually an issue upstream of here, so it actually needs the fix to be in the parser somewhere? I would have expected row.get("shortDescription")
to always return a string, and never a floating point NaN, so I'm kind of curious if there is something more fundamental about this happening that might cause issues in other places too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, I can do the .astype(str)
and check for a "nan"
string (I didn't see that comment when I responded previously). It's not an entirely empty column though. It parses it as type object
(code below). I can try just handling NaN and see if I get any issues generating xtce for SWE or CODICE, but we'll still need to check for both if others are parsing as None
for whatever reason.
xls = pd.ExcelFile("/Users/seho5886/Desktop/IMAP/hit/TLM_HIT_v20220524.xls")
df = xls.parse("P_HIT_HSKP", converters={"longDescription":str})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 packetName 80 non-null object
1 mnemonic 80 non-null object
2 sequence 80 non-null int64
3 lengthInBits 80 non-null int64
4 startBit 80 non-null int64
5 dataType 80 non-null object
6 convertAs 80 non-null object
7 units 80 non-null object
8 source 69 non-null object
9 shortDescription 78 non-null object
10 longDescription 8 non-null object
dtypes: int64(3), object(8)
memory usage: 7.0+ KB
df.longDescription.isnull()
0 False
1 False
2 False
3 False
4 False
...
75 True
76 True
77 True
78 True
79 True
Name: longDescription, Length: 80, dtype: bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, interesting that HIT is the only instrument with blank cells to cause this! This definitely seems like something others may produce in the future as well...
What about adding a converter like this? https://stackoverflow.com/questions/32591466/python-pandas-how-to-specify-data-types-when-reading-an-excel-file
May be splitting hairs here though, so your current solution is good, it just feels like there should be a way to push this upstream a bit further as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out what's going on. An empty cell in a column is parsed as NaN, but if row.get()
tries to access data from a column that doesn't exist it returns None
. SWE doesn't have a shortDescription
column which is where the None
was coming from before. Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] )
and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense now, nice investigation!
Would it be better to specify column names needed by pd.ExcelFile.parse("packet name", usecols=["mnemonic", "startByte", ...] ) and return an error saying the spreadsheet should be updated with the missing column? That could also be helpful in catching future spreadsheet issues, but it might be more annoying than it's worth.
I don't think we want to force people to add short and long descriptions if they aren't in the spreadsheets, that sounds like a lot of headache like you say. Yet another option to consider would be to return np.nan
from the get lookup fallback:
if not np.isnan(row.get("longDescription", np.nan)):
...
So then we don't have to check for both nan and None. Also see my comment below about adding both short and long descriptions as two separate if clauses, which might simplify some of the logic here as well if you want to implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Thanks! I also just found out that pd.isna()
/ pd.notna()
handles NaN and None the same.
@@ -250,9 +250,19 @@ def create_remaining_parameters(self, parameter_set): | |||
|
|||
description = Et.SubElement(parameter, "xtce:LongDescription") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GFMoraga and @tech3371 Do we want to put the short description into the long description tag?
Looking at this and searching through it looks like there is both short and long description supported. https://public.ccsds.org/Pubs/660x2g2.pdf
So, would it make sense to do something like the following so you could have both short and/or long descriptions supported?
description = Et.SubElement(parameter, "xtce:LongDescription") | |
if row.get("shortDescription"): | |
description = Et.SubElement(parameter, "xtce:ShortDescription") | |
description.text = row.get("shortDescription") | |
if row.get("longDescription"): | |
description = Et.SubElement(parameter, "xtce:LongDescription") | |
description.text = row.get("longDescription") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sdhoyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea. I know most will have BOTH, but not all will have the descriptions. As long as the this resolves the NaN error in the descriptions that are blank. I will say that the xtce.xml
file won't look the best with both, and we need to decide if both are required...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice simplification now to handle a lot of the cases. But note, that I think it will change people's XTCEs that were already generated because it may include additional description tags. So we should verify that @tech3371 and @GFMoraga are getting good outputs with this version too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a comment, but it looks great. Pandas
is supported, so it should be fine. If there are kinks, we can debug it again.
* added generators for hit and lo, updated telem generator * removed hit and lo files * better None/NAN handling, using both short and long description tags
* added generators for hit and lo, updated telem generator * removed hit and lo files * better None/NAN handling, using both short and long description tags
* added generators for hit and lo, updated telem generator * removed hit and lo files * better None/NAN handling, using both short and long description tags
@all-contributors please add @sdhoyt for maintenance |
I've put up a pull request to add @sdhoyt! 🎉 |
Change Summary
Overview
I was having an issue with Lo and HIT where there were nan values (floating point type nans) in the short/long descriptions that were causing an error when the xml tried to write. I added a couple lines to replace those with empty strings.
New Dependencies
None
New Files
None
Deleted Files
None
Updated Files
Testing
Tested by creating HIT and Lo XTCE