Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init commit of IDEX packet parser #79

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0fdc769
Init commit of IDEX packet parser
bryan-harter Aug 25, 2023
30694b8
Committing test data
bryan-harter Aug 25, 2023
eadbad2
Adding xarray as a dependency
bryan-harter Aug 25, 2023
623fd94
Responding to code review
bryan-harter Aug 31, 2023
f04831c
Fixing the dependencies
bryan-harter Aug 31, 2023
e6ecfbe
updating poetry.lock to sync with main branch
bryan-harter Aug 31, 2023
f606807
Merge branch 'dev' into idex_decom
bryan-harter Aug 31, 2023
f1376b3
Getting rid of old import statement
bryan-harter Aug 31, 2023
03b14fa
Updating poetry lock file
bryan-harter Aug 31, 2023
5a63649
Fixing the lock file, for some reason bitstring is
bryan-harter Aug 31, 2023
e4d293f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 31, 2023
055e335
Updating to latest version of space packet parser
bryan-harter Aug 31, 2023
c2f2d61
Changing packet parser to use individual Dust
bryan-harter Sep 1, 2023
00c247a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 1, 2023
1c9b818
Changing the requirements in poetry for xarray
bryan-harter Sep 1, 2023
a3194f3
Responding to code review comments, adding a unit
bryan-harter Sep 5, 2023
9fcc75b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 5, 2023
a8edd9f
Updating poetry.lock
bryan-harter Sep 5, 2023
a588de6
Merge branch 'idex_decom' of https://github.com/bryan-harter/imap_pro…
bryan-harter Sep 5, 2023
10d257d
Logging an error if a new science type is received
bryan-harter Sep 5, 2023
69b985c
Determining High and Low sample now
bryan-harter Sep 5, 2023
fdde569
Add more comments and make the variables clearer
bryan-harter Sep 6, 2023
9d773ed
Fixing some of the variable names, adding comments
bryan-harter Sep 7, 2023
b2b6721
Bug fix, wasn't returning right thing in new func
bryan-harter Sep 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions imap_processing/idex/IDEXPacketParser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import bitstring
import numpy as np
from space_packet_parser import parser, xtcedef
from imap_processing import packet_definition_directory


class IDEXPacketParser():
Copy link
Collaborator

@greglucas greglucas Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a docstring here explaining what this class does, what attributes it has, what methods.

Right now, everything happens within instantiation which seems like we don't really need a class, but possibly just a function that returns the data you want? Perhaps there is a separate class IDEXEvent that stores event information, or IDEXWaveform for the waveform data, where you'd be creating these dataclasses as you process a packet. So, the actual processing of a file would be a function call like in SWE:

def decom_packets(packet_file: str):

def decom_idex(...):
    list_of_idex_events = []
    for pkt in idex_packet_generator:
        # instantiate your event classes and fill them in as needed
        evt = IDEXEvent(pkt)
        list_of_idex_events.append(evt)
    return list_of_idex_events

We should decide how we want to architect these for consistency across instruments though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah everything interesting basically happens inside init() right now. I actually had several other methods here before which may or may not make it worth it to keep it as a class

  • plot_waveforms: A function to quickly plot the data in the self.data structure in matplotlib as a quicklook to verify accuracy (and this might even be extended into a quicklook product in the future? Or maybe we want to quicklook directly from the L1a)
  • create_l1a_cdf: the method called to populate data into an xarray object, and then convert it to cdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add high level explanation of what this class is doing? It would be helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your waveforms data associated with whatever packet file came in, or with an individual event/packet? I guess I'm getting at what level of granularity do we want for the class.

I think you could put those methods you describe onto the IDEXEvent class, where it would be plotting a waveform from each event, storing what type of science data it was in this event, etc...? I haven't looked super close, but that is just what my mind goes to. Or even separate functions within the module, idex.plot_waveforms(pass_in_data)

We are going to get dumps from the POC every 3 days, which means a packet file may be 3-days long I think? Or partial days, etc... I'm not sure we want our CDF creation lined up with those semi-arbitrary boundaries, but rather we could have a separate create_l1a_cdf(list_of_packets) function that takes in all of the events you want going into your l1a CDF and not necessarily tied to this current class.

I won't block on this, but I do think it is important to think about because you're the first and sort of setting up the template for others to follow. It looks like others like this large class though, so maybe I'm the odd one out 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a docstring is incoming! I should have included that first thing, sorry.

As far as the structure of the class, to my knowledge the L0 data we get from the MOC/POC will be sliced up grouped up into packets that span a single day, midnight to midnight UTC. So if there are 3 days between contacts, we should receive 3 separate L0 files. The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.

The waveforms are associated with a single dust event, so perhaps the class could be sub-divided into specific events rather than an entire L0 file (for everyone's reference, a single "event" in this context is a dust impact, which may occur roughly ~1-10 times a day). My concern with only looking at the files by a single event is:

  • I don't know if the packets within the L0 file are required to be grouped by event, so we may end up populating a bunch of classes at once. Right now the packets are all time-ordered, but their algorithm document specifies that technically we should be looking at the AID of each packet to ensure we're filling in data to the right event.
  • I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file
  • If we're hoping to keep the structure of the decom consistent across all of the instruments, IDEX would look like the oddball of the bunch if it grouped data by event rather than by a time range. I believe none of the other instruments are event based, and just continuously take data.

If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.

I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.

I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file

Fair enough, I think I was viewing Events as the critical assembly unit and CDF files being something we put together, where the event is the logical structure.

If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).

I think I'm just not being very clear with what is in my mind because I actually think it would be easier to parse (IMO) what is going on here rather than going to nested dictionaries with appended lists to try and get at the events (i.e. those 6 1D arrays would be stored in the DustEvent class. The datastore would then just work on this list of daily events.

daily_events = []
for packet in packet_dump:
    # DustEvent would be able to process any event and handle the logic for what science type it is
    # how to parse waveforms, etc...
    event = DustEvent(packet)
    daily_events.append(event)

# How many events?
len(daily_events)

# Filter by events of a certain type
len([event for event in daily_events if event.science_type == "COOL_SCIENCE_MODE"])

# Now further down if you want to make a CDF file with daily_events:
create_idex_cdf_l1a(daily_events)

# if you want you could have another class just to store groupings of events too
class DailyDustEvents:
    def __init__(events):
        self.events = events

    def create_l1a(self):
        pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment with the classes a little more, it probably will make it more readable to do it like you have with DailyDustEvents and DustEvent.

As for

I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.

That does happen frequently on MAVEN and EMM, but then the next downlink we receive "v02" of the pkts file and processing starts anew. Typically that means that v01-r00 of the file might be incomplete, but the next revision probably contains the full amount of data. I don't know what we want to do on this mission, but thats basically just what I was assuming would happen.

def __init__(self, idex_packet_file: str):

idex_xtce = f"{packet_definition_directory}/idex_packet_definition.xml"
idex_definition = xtcedef.XtcePacketDefinition(xtce_document=idex_xtce)
idex_parser = parser.PacketParser(idex_definition)

idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file)
idex_packet_generator = idex_parser.generator(idex_binary_data)

self.epochs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what is difference between epochs and data? This might relate to question I had about scitype == 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"epochs" is a list of times that the dust events have occurred, and the data is the actual data from those events. epoch is generally just the CDF/SPDFs term for "time".

self.data = {}

evtnum = 0
for pkt in idex_packet_generator:
if 'IDX__SCI0TYPE' in pkt.data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a comment to this specific PR, but what are the chances of packet field names changing? If there's a possibility that they could change down the line would it make sense to create some type of adapter so we wouldn't need to go through and change these strings? Or if they did change would we not care since we can call them whatever we want in the xtce files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm certainly not an expert at decom or XTCE, but I think we're just assigning a human-readable name to those specific 8 bits in the file. I went through and changed the code and the packet definition xml file to call it 'IDX__SCI0TYPE_X' instead of "IDX__SCI0TYPE", and everything still decoded fine.

scitype = pkt.data['IDX__SCI0TYPE'].raw_value
if scitype == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does scitype==1 means in concept?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different types of science packets we could expect, numbered [1, 2, 4, 8, 16, 32, 64].

"1" contains information about the events, and 2,3,8,16,32 and 64 contain the different 6 variables we're interested in. Maybe we can change these numbers to be more human-readable here.

evtnum += 1
self.epochs.append(pkt.data['SHCOARSE'].derived_value + 20*(10**(-6))*pkt.data['SHFINE'].derived_value) # Use this as the CDF epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add a comment or assign those numbers (20, 10, -6) to variables, so it's clear where they're coming from.

self._log_packet_info(evtnum, pkt) # These are for our logs, don't use

if scitype in [2, 4, 8, 16, 32, 64]:
if scitype not in self.data:
self.data.update({scitype : {}})
if evtnum not in self.data[scitype]:
self.data[scitype][evtnum] = pkt.data['IDX__SCI0RAW'].raw_value
else:
self.data[scitype][evtnum] += pkt.data['IDX__SCI0RAW'].raw_value

# Parse the waveforms according to the scitype present (high gain and low gain channels encode waveform data differently).
self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to add this in _parse_waveform_data function doc string because I was wondering what those (2, 4, 8) means.

32: "Target_High", 64: "Ion_Grid"}
datastore = {}
self.time_low_sr = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we spell out time_low_sr? more specifically, spell out sr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, "sr" just stands for "sampling rate". 3 of the important variables use a very high sampling rate, and the 3 of the other ones use a (comparitavely) very low sampling rate.

self.time_high_sr = []
for scitype in self.data:
datastore[self.scitype_to_names[scitype]] = []
for evt in self.data[scitype]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does evt stands for event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

datastore[self.scitype_to_names[scitype]].append(self._parse_waveform_data(self.data[scitype][evt], scitype))
if self.scitype_to_names[scitype] == 'Target_Low':
self.time_low_sr.append(np.linspace(0, len(datastore['Target_Low'][0]), len(datastore['Target_Low'][0])))
if self.scitype_to_names[scitype] == 'TOF_Low':
self.time_high_sr.append(np.linspace(0, len(datastore['TOF_Low'][0]), len(datastore['TOF_Low'][0])))

self.data = datastore
self.numevents = evtnum

def _log_packet_info(self, evtnum, pkt):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a logger here instead of prints, and add DEBUG log level if they aren't necessarily needed but helpful when we want to debug things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being nitpicky here, but could we use event_num and packet here instead of evtnum and pkt? I find that code is less readable when variables are unnecessarily abbreviated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure

print(f"^*****Event header {evtnum}******^")
# Extract the 17-22-bit integer (usually 8)
self.lspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 16) & 0b1111
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more than logging at this point by assigning to attributes. Do we need these attributes added to the class, or can they just be local variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I think it's possible they could be added at some point, but I haven't figured out if they actually want that stuff in the CDF. My guess is they won't want it, so I'll remove the "self" stuff

# Extract the next 4-bit integer (usually 8)
self.lsposttrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 12) & 0b1111
# Extract the next 6 bits integer (usually 32)
self.hspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 6) & 0b111111
# Extract the first 6 bits (usually 32)
self.hsposttrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value) & 0b111111
print("HS pre trig sampling blocks: ", self.hspretrigblocks)
print("LS pre trig sampling blocks: ", self.lspretrigblocks)
print("HS post trig sampling blocks: ", self.hsposttrigblocks)
print("LS post trig sampling blocks: ", self.lsposttrigblocks)
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding
self.tof_delay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding

We should try and stick to python variable naming conventions when possible.

mask = 0b1111111111
self.lgdelay = (self.TOFdelay) & mask
self.mgdelay = (self.TOFdelay >> 10) & mask
self.hgdelay = (self.TOFdelay >> 20) & mask
print(f"High gain delay = {self.hgdelay} samples.")
print(f"Mid gain delay = {self.mgdelay} samples.")
print(f"Low gain delay = {self.lgdelay} samples.")
if(pkt.data['IDX__TXHDRLSTRIGMODE'].derived_value!=0): # If this was a LS (Target Low Gain) trigger
self.Triggerorigin = 'LS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will these Trigger variables be used for? In the case that some of these if conditions are not satisfied, they might not be defined when something tries to access them from the object, so should be initialized under init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these are only used for logging really, so I don't know if it matters that they are like this. I'm going to get rid of the "self." because I don't think the rest of the class needs to know or care about them.

print("Low sampling trigger mode enabled.")
print("Packet trigger mode = ", pkt.data['IDX__TXHDRLGTRIGMODE'].derived_value, pkt.data['IDX__TXHDRMGTRIGMODE'].derived_value, pkt.data['IDX__TXHDRHGTRIGMODE'].derived_value)
if(pkt.data['IDX__TXHDRLGTRIGMODE'].derived_value!=0):
print("Low gain TOF trigger mode enabled.")
self.Triggerorigin = 'LG'
if(pkt.data['IDX__TXHDRMGTRIGMODE'].derived_value!=0):
print("Mid gain TOF trigger mode enabled.")
self.Triggerorigin = 'MG'
if(pkt.data['IDX__TXHDRHGTRIGMODE'].derived_value!=0):
print("High gain trigger mode enabled.")
self.Triggerorigin = 'HG'
print(f"AID = {pkt.data['IDX__SCI0AID'].derived_value}") # Instrument event number
print(f"Event number = {pkt.data['IDX__SCI0EVTNUM'].raw_value}") # Event number out of how many events constitute the file
print(f"Rice compression enabled = {bool(pkt.data['IDX__SCI0COMP'].raw_value)}")
compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)")

def _parse_hs_waveform(self, waveform_raw: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't accessing self at all, so this could be a static method on the class, but again this seems like a helper function within this module so isn't needed on the class explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what hs stands for here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high sampling, I'll spell it out in the next commit!

"""Parse a binary string representing a high gain waveform"""
w = bitstring.ConstBitStream(bin=waveform_raw)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
w = bitstring.ConstBitStream(bin=waveform_raw)
waveform = bitstring.ConstBitStream(bin=waveform_raw)

ints = []
while w.pos < len(w):
w.read('pad:2') # skip 2
ints += w.readlist(['uint:10']*3)
return ints

def _parse_ls_waveform(self, waveform_raw: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what ls stands for here

"""Parse a binary string representing a low gain waveform"""
w = bitstring.ConstBitStream(bin=waveform_raw)
ints = []
while w.pos < len(w):
w.read('pad:8') # skip 2
ints += w.readlist(['uint:12']*2)
return ints

def _parse_waveform_data(self, waveform: str, scitype: int):
"""Parse the binary string that represents a waveform"""
print(f'Parsing waveform for scitype={scitype}')
if scitype in (2, 4, 8):
return self._parse_hs_waveform(waveform)
else:
return self._parse_ls_waveform(waveform)
Empty file.
Empty file.
Binary file not shown.
15 changes: 15 additions & 0 deletions imap_processing/idex/tests/test_decom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import pytest

from imap_processing.idex.IDEXPacketParser import IDEXPacketParser


@pytest.fixture(scope="session")
def decom_test_data():
return IDEXPacketParser("imap_processing/idex/tests/imap_idex_l0_20230725_v01-00.pkts")

def test_idex_decom_length(decom_test_data):
assert len(decom_test_data.data) == 6

def test_idex_decom_event_num(decom_test_data):
for var in decom_test_data.data:
assert len(decom_test_data.data[var]) == 19
Loading