-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init commit of IDEX packet parser #79
Changes from 3 commits
0fdc769
30694b8
eadbad2
623fd94
f04831c
e6ecfbe
f606807
f1376b3
03b14fa
5a63649
e4d293f
055e335
c2f2d61
00c247a
1c9b818
a3194f3
9fcc75b
a8edd9f
a588de6
10d257d
69b985c
fdde569
9d773ed
b2b6721
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,120 @@ | ||||||
import bitstring | ||||||
import numpy as np | ||||||
from space_packet_parser import parser, xtcedef | ||||||
from imap_processing import packet_definition_directory | ||||||
|
||||||
|
||||||
class IDEXPacketParser(): | ||||||
def __init__(self, idex_packet_file: str): | ||||||
|
||||||
idex_xtce = f"{packet_definition_directory}/idex_packet_definition.xml" | ||||||
idex_definition = xtcedef.XtcePacketDefinition(xtce_document=idex_xtce) | ||||||
idex_parser = parser.PacketParser(idex_definition) | ||||||
|
||||||
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file) | ||||||
idex_packet_generator = idex_parser.generator(idex_binary_data) | ||||||
|
||||||
self.epochs = [] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering what is difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "epochs" is a list of times that the dust events have occurred, and the data is the actual data from those events. epoch is generally just the CDF/SPDFs term for "time". |
||||||
self.data = {} | ||||||
|
||||||
evtnum = 0 | ||||||
for pkt in idex_packet_generator: | ||||||
if 'IDX__SCI0TYPE' in pkt.data: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't really a comment to this specific PR, but what are the chances of packet field names changing? If there's a possibility that they could change down the line would it make sense to create some type of adapter so we wouldn't need to go through and change these strings? Or if they did change would we not care since we can call them whatever we want in the xtce files? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I'm certainly not an expert at decom or XTCE, but I think we're just assigning a human-readable name to those specific 8 bits in the file. I went through and changed the code and the packet definition xml file to call it 'IDX__SCI0TYPE_X' instead of "IDX__SCI0TYPE", and everything still decoded fine. |
||||||
scitype = pkt.data['IDX__SCI0TYPE'].raw_value | ||||||
if scitype == 1: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does scitype==1 means in concept? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are different types of science packets we could expect, numbered [1, 2, 4, 8, 16, 32, 64]. "1" contains information about the events, and 2,3,8,16,32 and 64 contain the different 6 variables we're interested in. Maybe we can change these numbers to be more human-readable here. |
||||||
evtnum += 1 | ||||||
self.epochs.append(pkt.data['SHCOARSE'].derived_value + 20*(10**(-6))*pkt.data['SHFINE'].derived_value) # Use this as the CDF epoch | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be good to add a comment or assign those numbers (20, 10, -6) to variables, so it's clear where they're coming from. |
||||||
self._log_packet_info(evtnum, pkt) # These are for our logs, don't use | ||||||
|
||||||
if scitype in [2, 4, 8, 16, 32, 64]: | ||||||
if scitype not in self.data: | ||||||
self.data.update({scitype : {}}) | ||||||
if evtnum not in self.data[scitype]: | ||||||
self.data[scitype][evtnum] = pkt.data['IDX__SCI0RAW'].raw_value | ||||||
else: | ||||||
self.data[scitype][evtnum] += pkt.data['IDX__SCI0RAW'].raw_value | ||||||
|
||||||
# Parse the waveforms according to the scitype present (high gain and low gain channels encode waveform data differently). | ||||||
self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be nice to add this in |
||||||
32: "Target_High", 64: "Ion_Grid"} | ||||||
datastore = {} | ||||||
self.time_low_sr = [] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we spell out There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure thing, "sr" just stands for "sampling rate". 3 of the important variables use a very high sampling rate, and the 3 of the other ones use a (comparitavely) very low sampling rate. |
||||||
self.time_high_sr = [] | ||||||
for scitype in self.data: | ||||||
datastore[self.scitype_to_names[scitype]] = [] | ||||||
for evt in self.data[scitype]: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does evt stands for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! |
||||||
datastore[self.scitype_to_names[scitype]].append(self._parse_waveform_data(self.data[scitype][evt], scitype)) | ||||||
if self.scitype_to_names[scitype] == 'Target_Low': | ||||||
self.time_low_sr.append(np.linspace(0, len(datastore['Target_Low'][0]), len(datastore['Target_Low'][0]))) | ||||||
if self.scitype_to_names[scitype] == 'TOF_Low': | ||||||
self.time_high_sr.append(np.linspace(0, len(datastore['TOF_Low'][0]), len(datastore['TOF_Low'][0]))) | ||||||
|
||||||
self.data = datastore | ||||||
self.numevents = evtnum | ||||||
|
||||||
def _log_packet_info(self, evtnum, pkt): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should use a logger here instead of prints, and add DEBUG log level if they aren't necessarily needed but helpful when we want to debug things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm being nitpicky here, but could we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah for sure |
||||||
print(f"^*****Event header {evtnum}******^") | ||||||
# Extract the 17-22-bit integer (usually 8) | ||||||
self.lspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 16) & 0b1111 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is more than logging at this point by assigning to attributes. Do we need these attributes added to the class, or can they just be local variables? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yeah I think it's possible they could be added at some point, but I haven't figured out if they actually want that stuff in the CDF. My guess is they won't want it, so I'll remove the "self" stuff |
||||||
# Extract the next 4-bit integer (usually 8) | ||||||
self.lsposttrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 12) & 0b1111 | ||||||
# Extract the next 6 bits integer (usually 32) | ||||||
self.hspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 6) & 0b111111 | ||||||
# Extract the first 6 bits (usually 32) | ||||||
self.hsposttrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value) & 0b111111 | ||||||
print("HS pre trig sampling blocks: ", self.hspretrigblocks) | ||||||
print("LS pre trig sampling blocks: ", self.lspretrigblocks) | ||||||
print("HS post trig sampling blocks: ", self.hsposttrigblocks) | ||||||
print("LS post trig sampling blocks: ", self.lsposttrigblocks) | ||||||
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
We should try and stick to python variable naming conventions when possible. |
||||||
mask = 0b1111111111 | ||||||
self.lgdelay = (self.TOFdelay) & mask | ||||||
self.mgdelay = (self.TOFdelay >> 10) & mask | ||||||
self.hgdelay = (self.TOFdelay >> 20) & mask | ||||||
print(f"High gain delay = {self.hgdelay} samples.") | ||||||
print(f"Mid gain delay = {self.mgdelay} samples.") | ||||||
print(f"Low gain delay = {self.lgdelay} samples.") | ||||||
if(pkt.data['IDX__TXHDRLSTRIGMODE'].derived_value!=0): # If this was a LS (Target Low Gain) trigger | ||||||
self.Triggerorigin = 'LS' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What will these Trigger variables be used for? In the case that some of these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah these are only used for logging really, so I don't know if it matters that they are like this. I'm going to get rid of the "self." because I don't think the rest of the class needs to know or care about them. |
||||||
print("Low sampling trigger mode enabled.") | ||||||
print("Packet trigger mode = ", pkt.data['IDX__TXHDRLGTRIGMODE'].derived_value, pkt.data['IDX__TXHDRMGTRIGMODE'].derived_value, pkt.data['IDX__TXHDRHGTRIGMODE'].derived_value) | ||||||
if(pkt.data['IDX__TXHDRLGTRIGMODE'].derived_value!=0): | ||||||
print("Low gain TOF trigger mode enabled.") | ||||||
self.Triggerorigin = 'LG' | ||||||
if(pkt.data['IDX__TXHDRMGTRIGMODE'].derived_value!=0): | ||||||
print("Mid gain TOF trigger mode enabled.") | ||||||
self.Triggerorigin = 'MG' | ||||||
if(pkt.data['IDX__TXHDRHGTRIGMODE'].derived_value!=0): | ||||||
print("High gain trigger mode enabled.") | ||||||
self.Triggerorigin = 'HG' | ||||||
print(f"AID = {pkt.data['IDX__SCI0AID'].derived_value}") # Instrument event number | ||||||
print(f"Event number = {pkt.data['IDX__SCI0EVTNUM'].raw_value}") # Event number out of how many events constitute the file | ||||||
print(f"Rice compression enabled = {bool(pkt.data['IDX__SCI0COMP'].raw_value)}") | ||||||
compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data | ||||||
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)") | ||||||
|
||||||
def _parse_hs_waveform(self, waveform_raw: str): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You aren't accessing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not clear to me what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. high sampling, I'll spell it out in the next commit! |
||||||
"""Parse a binary string representing a high gain waveform""" | ||||||
w = bitstring.ConstBitStream(bin=waveform_raw) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
ints = [] | ||||||
while w.pos < len(w): | ||||||
w.read('pad:2') # skip 2 | ||||||
ints += w.readlist(['uint:10']*3) | ||||||
return ints | ||||||
|
||||||
def _parse_ls_waveform(self, waveform_raw: str): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not clear to me what |
||||||
"""Parse a binary string representing a low gain waveform""" | ||||||
w = bitstring.ConstBitStream(bin=waveform_raw) | ||||||
ints = [] | ||||||
while w.pos < len(w): | ||||||
w.read('pad:8') # skip 2 | ||||||
ints += w.readlist(['uint:12']*2) | ||||||
return ints | ||||||
|
||||||
def _parse_waveform_data(self, waveform: str, scitype: int): | ||||||
"""Parse the binary string that represents a waveform""" | ||||||
print(f'Parsing waveform for scitype={scitype}') | ||||||
if scitype in (2, 4, 8): | ||||||
return self._parse_hs_waveform(waveform) | ||||||
else: | ||||||
return self._parse_ls_waveform(waveform) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import pytest | ||
|
||
from imap_processing.idex.IDEXPacketParser import IDEXPacketParser | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def decom_test_data(): | ||
return IDEXPacketParser("imap_processing/idex/tests/imap_idex_l0_20230725_v01-00.pkts") | ||
|
||
def test_idex_decom_length(decom_test_data): | ||
assert len(decom_test_data.data) == 6 | ||
|
||
def test_idex_decom_event_num(decom_test_data): | ||
for var in decom_test_data.data: | ||
assert len(decom_test_data.data[var]) == 19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a docstring here explaining what this class does, what attributes it has, what methods.
Right now, everything happens within instantiation which seems like we don't really need a class, but possibly just a function that returns the data you want? Perhaps there is a separate
class IDEXEvent
that stores event information, orIDEXWaveform
for the waveform data, where you'd be creating these dataclasses as you process a packet. So, the actual processing of a file would be a function call like in SWE:imap_processing/imap_processing/swe/decom_swe.py
Line 4 in 406de26
We should decide how we want to architect these for consistency across instruments though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah everything interesting basically happens inside init() right now. I actually had several other methods here before which may or may not make it worth it to keep it as a class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add high level explanation of what this class is doing? It would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your waveforms data associated with whatever packet file came in, or with an individual event/packet? I guess I'm getting at what level of granularity do we want for the class.
I think you could put those methods you describe onto the
IDEXEvent
class, where it would be plotting a waveform from each event, storing what type of science data it was in this event, etc...? I haven't looked super close, but that is just what my mind goes to. Or even separate functions within the module,idex.plot_waveforms(pass_in_data)
We are going to get dumps from the POC every 3 days, which means a packet file may be 3-days long I think? Or partial days, etc... I'm not sure we want our CDF creation lined up with those semi-arbitrary boundaries, but rather we could have a separate
create_l1a_cdf(list_of_packets)
function that takes in all of the events you want going into your l1a CDF and not necessarily tied to this current class.I won't block on this, but I do think it is important to think about because you're the first and sort of setting up the template for others to follow. It looks like others like this large class though, so maybe I'm the odd one out 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a docstring is incoming! I should have included that first thing, sorry.
As far as the structure of the class, to my knowledge the L0 data we get from the MOC/POC will be sliced up grouped up into packets that span a single day, midnight to midnight UTC. So if there are 3 days between contacts, we should receive 3 separate L0 files. The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.
The waveforms are associated with a single dust event, so perhaps the class could be sub-divided into specific events rather than an entire L0 file (for everyone's reference, a single "event" in this context is a dust impact, which may occur roughly ~1-10 times a day). My concern with only looking at the files by a single event is:
If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.
Fair enough, I think I was viewing Events as the critical assembly unit and CDF files being something we put together, where the event is the logical structure.
I think I'm just not being very clear with what is in my mind because I actually think it would be easier to parse (IMO) what is going on here rather than going to nested dictionaries with appended lists to try and get at the events (i.e. those 6 1D arrays would be stored in the
DustEvent
class. The datastore would then just work on this list of daily events.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll experiment with the classes a little more, it probably will make it more readable to do it like you have with DailyDustEvents and DustEvent.
As for
That does happen frequently on MAVEN and EMM, but then the next downlink we receive "v02" of the pkts file and processing starts anew. Typically that means that v01-r00 of the file might be incomplete, but the next revision probably contains the full amount of data. I don't know what we want to do on this mission, but thats basically just what I was assuming would happen.