Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init commit of IDEX packet parser #79

Merged

Conversation

bryan-harter
Copy link
Contributor

@bryan-harter bryan-harter commented Aug 25, 2023

Change Summary

Overview

These changes allow for IDEX L0 files to be decommed and stored in a temporary data structure.

This is done via a new class, IDEXPacketParser, which is initialized with a packet file. It creates a generator object in space_packet_parser, and loops through the file appending data to these temporary data structures.

New Dependencies

bitstring must not be the latest version currently
Adding in xarray dependency (a little pre-emptively, but it comes with numpy which I currently need)

New Files

  • imap_processing/idex/IDEXPacketParser.py
    • Contains a single class which reads in data from 1 day of IDEX pkts and creates a dictionary data structure from it.
  • imap_processing/packet_definitions/idex_packet_definition.xml
    • The XML packet definition file for the IDEX pkt files used in space_packet_parser
  • imap_processing/idex/tests/test_decom.py
    • A unit test that decoms the test data in the repo and verifies its accuracy
  • imap_processing/idex/tests/imap_idex_l0_20230725_v01-00.pkts
    • L0 data taken from the IDEX instrument last month from instrument testing

Testing

The unit tests right now decom the packet and verify the length of things are accurate, at some point it should be checked against the data Ethan has in his his HDF5 files

If you want to test it locally, it should be as simple as calling these functions:

from imap_processing.idex.IDEXPacketParser import IDEXPacketParser
packets = IDEXPacketParser('imap_processing/idex/tests/imap_idex_l0_20230725_v01-00.pkts')
print(packets.data)

@bryan-harter bryan-harter requested review from a team, bourque, sdhoyt, greglucas, tech3371, laspsandoval, GFMoraga and maxinelasp and removed request for a team August 28, 2023 17:22
Copy link
Contributor

@GFMoraga GFMoraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this overall looks great! I didn't test it on my end, but the parsing looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

@bryan-harter
Copy link
Contributor Author

I need to get "black" and the other linting stuff running so the formatting looks nicer, right now it's kind of a mess

from imap_processing import packet_definition_directory


class IDEXPacketParser():
Copy link
Collaborator

@greglucas greglucas Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a docstring here explaining what this class does, what attributes it has, what methods.

Right now, everything happens within instantiation which seems like we don't really need a class, but possibly just a function that returns the data you want? Perhaps there is a separate class IDEXEvent that stores event information, or IDEXWaveform for the waveform data, where you'd be creating these dataclasses as you process a packet. So, the actual processing of a file would be a function call like in SWE:

def decom_packets(packet_file: str):

def decom_idex(...):
    list_of_idex_events = []
    for pkt in idex_packet_generator:
        # instantiate your event classes and fill them in as needed
        evt = IDEXEvent(pkt)
        list_of_idex_events.append(evt)
    return list_of_idex_events

We should decide how we want to architect these for consistency across instruments though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah everything interesting basically happens inside init() right now. I actually had several other methods here before which may or may not make it worth it to keep it as a class

  • plot_waveforms: A function to quickly plot the data in the self.data structure in matplotlib as a quicklook to verify accuracy (and this might even be extended into a quicklook product in the future? Or maybe we want to quicklook directly from the L1a)
  • create_l1a_cdf: the method called to populate data into an xarray object, and then convert it to cdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add high level explanation of what this class is doing? It would be helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your waveforms data associated with whatever packet file came in, or with an individual event/packet? I guess I'm getting at what level of granularity do we want for the class.

I think you could put those methods you describe onto the IDEXEvent class, where it would be plotting a waveform from each event, storing what type of science data it was in this event, etc...? I haven't looked super close, but that is just what my mind goes to. Or even separate functions within the module, idex.plot_waveforms(pass_in_data)

We are going to get dumps from the POC every 3 days, which means a packet file may be 3-days long I think? Or partial days, etc... I'm not sure we want our CDF creation lined up with those semi-arbitrary boundaries, but rather we could have a separate create_l1a_cdf(list_of_packets) function that takes in all of the events you want going into your l1a CDF and not necessarily tied to this current class.

I won't block on this, but I do think it is important to think about because you're the first and sort of setting up the template for others to follow. It looks like others like this large class though, so maybe I'm the odd one out 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a docstring is incoming! I should have included that first thing, sorry.

As far as the structure of the class, to my knowledge the L0 data we get from the MOC/POC will be sliced up grouped up into packets that span a single day, midnight to midnight UTC. So if there are 3 days between contacts, we should receive 3 separate L0 files. The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.

The waveforms are associated with a single dust event, so perhaps the class could be sub-divided into specific events rather than an entire L0 file (for everyone's reference, a single "event" in this context is a dust impact, which may occur roughly ~1-10 times a day). My concern with only looking at the files by a single event is:

  • I don't know if the packets within the L0 file are required to be grouped by event, so we may end up populating a bunch of classes at once. Right now the packets are all time-ordered, but their algorithm document specifies that technically we should be looking at the AID of each packet to ensure we're filling in data to the right event.
  • I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file
  • If we're hoping to keep the structure of the decom consistent across all of the instruments, IDEX would look like the oddball of the bunch if it grouped data by event rather than by a time range. I believe none of the other instruments are event based, and just continuously take data.

If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.

I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.

I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file

Fair enough, I think I was viewing Events as the critical assembly unit and CDF files being something we put together, where the event is the logical structure.

If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).

I think I'm just not being very clear with what is in my mind because I actually think it would be easier to parse (IMO) what is going on here rather than going to nested dictionaries with appended lists to try and get at the events (i.e. those 6 1D arrays would be stored in the DustEvent class. The datastore would then just work on this list of daily events.

daily_events = []
for packet in packet_dump:
    # DustEvent would be able to process any event and handle the logic for what science type it is
    # how to parse waveforms, etc...
    event = DustEvent(packet)
    daily_events.append(event)

# How many events?
len(daily_events)

# Filter by events of a certain type
len([event for event in daily_events if event.science_type == "COOL_SCIENCE_MODE"])

# Now further down if you want to make a CDF file with daily_events:
create_idex_cdf_l1a(daily_events)

# if you want you could have another class just to store groupings of events too
class DailyDustEvents:
    def __init__(events):
        self.events = events

    def create_l1a(self):
        pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment with the classes a little more, it probably will make it more readable to do it like you have with DailyDustEvents and DustEvent.

As for

I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.

That does happen frequently on MAVEN and EMM, but then the next downlink we receive "v02" of the pkts file and processing starts anew. Typically that means that v01-r00 of the file might be incomplete, but the next revision probably contains the full amount of data. I don't know what we want to do on this mission, but thats basically just what I was assuming would happen.

self.data = datastore
self.numevents = evtnum

def _log_packet_info(self, evtnum, pkt):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a logger here instead of prints, and add DEBUG log level if they aren't necessarily needed but helpful when we want to debug things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

def _log_packet_info(self, evtnum, pkt):
print(f"^*****Event header {evtnum}******^")
# Extract the 17-22-bit integer (usually 8)
self.lspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 16) & 0b1111
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more than logging at this point by assigning to attributes. Do we need these attributes added to the class, or can they just be local variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I think it's possible they could be added at some point, but I haven't figured out if they actually want that stuff in the CDF. My guess is they won't want it, so I'll remove the "self" stuff

print("LS pre trig sampling blocks: ", self.lspretrigblocks)
print("HS post trig sampling blocks: ", self.hsposttrigblocks)
print("LS post trig sampling blocks: ", self.lsposttrigblocks)
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding
self.tof_delay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding

We should try and stick to python variable naming conventions when possible.

compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)")

def _parse_hs_waveform(self, waveform_raw: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't accessing self at all, so this could be a static method on the class, but again this seems like a helper function within this module so isn't needed on the class explicitly.

pyproject.toml Outdated
@@ -30,6 +30,8 @@ classifiers = [
[tool.poetry.dependencies]
python = ">=3.9,<3.12"
space_packet_parser = ">=4.0.2,<5"
bitstring = '<=4.0.1'
xarray = '>=2022.11.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are using it yet. I realize we will likely later, but I think we should wait to add dependencies until necessary.

pyproject.toml Outdated
@@ -30,6 +30,8 @@ classifiers = [
[tool.poetry.dependencies]
python = ">=3.9,<3.12"
space_packet_parser = ">=4.0.2,<5"
bitstring = '<=4.0.1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bitstring works with any version of your code, it is the combination of bitstring + space_packet_parser that isn't working. I'm not sure how to explicitly call that out here though... Just indicating that I don't think it is ideal to pin bitstring back when it is really space_packet_parser that is causing the failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I don't know what else to do though. If you do a regular poetry install without specifying bitstring version here, the code doesn't work. Gavin is working on a fix, we could just wait for that before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gavin already pushed a new version to pypi!

Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it locally and it seems to work! First algorithm implementation. Nice work!

from imap_processing import packet_definition_directory


class IDEXPacketParser():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add high level explanation of what this class is doing? It would be helpful.

self.data[scitype][evtnum] += pkt.data['IDX__SCI0RAW'].raw_value

# Parse the waveforms according to the scitype present (high gain and low gain channels encode waveform data differently).
self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to add this in _parse_waveform_data function doc string because I was wondering what those (2, 4, 8) means.

for pkt in idex_packet_generator:
if 'IDX__SCI0TYPE' in pkt.data:
scitype = pkt.data['IDX__SCI0TYPE'].raw_value
if scitype == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does scitype==1 means in concept?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different types of science packets we could expect, numbered [1, 2, 4, 8, 16, 32, 64].

"1" contains information about the events, and 2,3,8,16,32 and 64 contain the different 6 variables we're interested in. Maybe we can change these numbers to be more human-readable here.

idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file)
idex_packet_generator = idex_parser.generator(idex_binary_data)

self.epochs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what is difference between epochs and data? This might relate to question I had about scitype == 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"epochs" is a list of times that the dust events have occurred, and the data is the actual data from those events. epoch is generally just the CDF/SPDFs term for "time".

self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low",
32: "Target_High", 64: "Ion_Grid"}
datastore = {}
self.time_low_sr = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we spell out time_low_sr? more specifically, spell out sr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, "sr" just stands for "sampling rate". 3 of the important variables use a very high sampling rate, and the 3 of the other ones use a (comparitavely) very low sampling rate.

self.time_high_sr = []
for scitype in self.data:
datastore[self.scitype_to_names[scitype]] = []
for evt in self.data[scitype]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does evt stands for event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Contributor

@laspsandoval laspsandoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This gives me ideas for what to do with mine. Try to get sample output data associated with packets to confirm output is as expected.


evtnum = 0
for pkt in idex_packet_generator:
if 'IDX__SCI0TYPE' in pkt.data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a comment to this specific PR, but what are the chances of packet field names changing? If there's a possibility that they could change down the line would it make sense to create some type of adapter so we wouldn't need to go through and change these strings? Or if they did change would we not care since we can call them whatever we want in the xtce files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm certainly not an expert at decom or XTCE, but I think we're just assigning a human-readable name to those specific 8 bits in the file. I went through and changed the code and the packet definition xml file to call it 'IDX__SCI0TYPE_X' instead of "IDX__SCI0TYPE", and everything still decoded fine.

print(f"Mid gain delay = {self.mgdelay} samples.")
print(f"Low gain delay = {self.lgdelay} samples.")
if(pkt.data['IDX__TXHDRLSTRIGMODE'].derived_value!=0): # If this was a LS (Target Low Gain) trigger
self.Triggerorigin = 'LS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will these Trigger variables be used for? In the case that some of these if conditions are not satisfied, they might not be defined when something tries to access them from the object, so should be initialized under init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these are only used for logging really, so I don't know if it matters that they are like this. I'm going to get rid of the "self." because I don't think the rest of the class needs to know or care about them.

scitype = pkt.data['IDX__SCI0TYPE'].raw_value
if scitype == 1:
evtnum += 1
self.epochs.append(pkt.data['SHCOARSE'].derived_value + 20*(10**(-6))*pkt.data['SHFINE'].derived_value) # Use this as the CDF epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add a comment or assign those numbers (20, 10, -6) to variables, so it's clear where they're coming from.

@sdhoyt
Copy link
Contributor

sdhoyt commented Aug 29, 2023

Great job! This will be a good reference for my decom as well

self.data = datastore
self.numevents = evtnum

def _log_packet_info(self, evtnum, pkt):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being nitpicky here, but could we use event_num and packet here instead of evtnum and pkt? I find that code is less readable when variables are unnecessarily abbreviated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure


def _parse_hs_waveform(self, waveform_raw: str):
"""Parse a binary string representing a high gain waveform"""
w = bitstring.ConstBitStream(bin=waveform_raw)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
w = bitstring.ConstBitStream(bin=waveform_raw)
waveform = bitstring.ConstBitStream(bin=waveform_raw)

ints += w.readlist(['uint:10']*3)
return ints

def _parse_ls_waveform(self, waveform_raw: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what ls stands for here

compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)")

def _parse_hs_waveform(self, waveform_raw: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what hs stands for here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high sampling, I'll spell it out in the next commit!

Copy link
Collaborator

@bourque bourque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I just had a few nitpicky suggestion, but otherwise it looks good to me!

@bryan-harter
Copy link
Contributor Author

I think I just responded to everyone's comments, with the exception of Greg's comments about perhaps using classes to encapsulate a dust event rather than this nested dictionary structure. I'm still going to play around with that suggestion today.

@bryan-harter
Copy link
Contributor Author

pre-commit.ci autofix

@bryan-harter
Copy link
Contributor Author

pre-commit.ci autofix

@bryan-harter
Copy link
Contributor Author

I just made an update so that the data structure uses classes for individual dust events as Greg suggested, I kinda like it since it seems more intuitive to me. Plus it now outputs an xarray data object which (spoiler alert) will be used in my next commit. But it's pretty unlike the other instruments since the rest won't rely on individual events.

If ya'll decide the other way way better though, I'm happy to revert to 055e335

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, I like this better personally! I've got quite a few nits and suggestions, but nothing major. The one thing I'd suggest is continuing to go down this route and pass packet into your DustEvent constructor and methods and let DustEvent figure out what needs to happen with the packet. So, your PacketParser class has as little IDEX logic in it as possible and rather defers most of that to the events themselves.

Comment on lines 55 to 56
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file)
idex_packet_generator = idex_parser.generator(idex_binary_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file)
idex_packet_generator = idex_parser.generator(idex_binary_data)
with open(packet_file, "rb") as binary_data:
packet_generator = parser.generator(binary_data)

Following the examples from space_packet_parser: https://space-packet-parser.readthedocs.io/en/latest/users.html#basic-usage
where this will cleanup the files for you automatically.



class IDEXPacketParser:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow numpydoc conventions according to the style guide: https://imap-processing.readthedocs.io/en/latest/development/style-guide/api-documentation.html

Parameters
------------

There is also no filename parameter, and I think parameters can be documented within the constructor below and put attributes here.

* IDEX_Trigger is a placeholder variable for now
"""

idex_xtce = f"{packet_definition_directory}/idex_packet_definition.xml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are within IDEX already throughout this module, so I'd leave it as xtce_file or something like that without the idex_ prefix everywhere.

idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file)
idex_packet_generator = idex_parser.generator(idex_binary_data)

self.coords = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.coords = {}

not used anywhere?

idex_packet_generator = idex_parser.generator(idex_binary_data)

self.coords = {}
self.data = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.data = {}

Just set it at the end like you're already doing and then the types won't get conflicted either since it is an xarray dataset later on.

Comment on lines 93 to 95
processed_dust_impact_list = []
for event_number in self.dust_events:
processed_dust_impact_list.append(self.dust_events[event_number].process())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one-liner?

Suggested change
processed_dust_impact_list = []
for event_number in self.dust_events:
processed_dust_impact_list.append(self.dust_events[event_number].process())
processed_dust_impact_list = [dust_event.process() for dust_event in self.dust_events.values()]

packet, nothing here should affect the data
"""
event_num = pkt.data["IDX__SCI0EVTNUM"].derived_value
logging.info(f"^*****Event header {event_num}******^")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want these logs, or do we only want them while debugging?

logging.debug()

elif scitype == 32:
self.Target_High_bits += bits
elif scitype == 64:
self.Ion_Grid_bits += bits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.Ion_Grid_bits += bits
self.Ion_Grid_bits += bits
else:
logging.warning("Unknown science type received: [%s]", scitype)

Add some warnings/raises if we are getting into the edge cases?

pyproject.toml Outdated
@@ -29,7 +29,8 @@ classifiers = [

[tool.poetry.dependencies]
python = ">=3.9,<3.12"
space_packet_parser = ">=4.0.2,<5"
xarray = '>=0.0.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove the version dependency if you're going to put it at 0s?

or some more recent xarray is likely to be required.

Suggested change
xarray = '>=0.0.0'
xarray = '>=2023.0.0'

)


class IDEXRawDustEvent:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we feel about naming classes. I guess I'd vote for dropping IDEX personally. So that if I were to import it later it'd be: from imap_processing.idex import RawDustEvent and not duplicating IDEX. But, I don't have a strong preference.

@bryan-harter
Copy link
Contributor Author

pre-commit.ci autofix

@bryan-harter
Copy link
Contributor Author

Responded to more comments, also added a unit test that actually tests a small amount of the data

(rather than placeholder)
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me! I just requested couple places where we could use comments but it looks great!

header_packet.data["IDX__TXHDRSAMPDELAY"].raw_value >> 2
) # First two bits are padding
mask = 0b1111111111
hgdelay = (tof_delay >> 20) & mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is hgdelay? can we spell out hg?

) # First two bits are padding
mask = 0b1111111111
hgdelay = (tof_delay >> 20) & mask
lspretrigblocks = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment about why those bits are shifted? It helps to give a high level context to what is happening.

logging.debug(f"^*****Event header {event_num}******^")
logging.debug(
f"Timestamp = {self.impact_time} seconds since epoch \
(Midnight January 1st, 2012)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this IMAP specific epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so, it's just the epoch that they use to keep track time of in their header packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

packet.data["IDX__TXHDRSAMPDELAY"].raw_value >> 2
) # First two bits are padding
mask = 0b1111111111
lgdelay = (tof_delay) & mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. hg means high gain. and mg means mid gain.

can we spell it out? I think it will help for readability.

def _parse_high_sample_waveform(self, waveform_raw: str):
"""
Parse a binary string representing a high sample waveform
Data arrives in 10 bit chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!

while w.pos < len(w):
w.read("pad:2") # skip 2
ints += w.readlist(["uint:10"] * 3)
return ints[:-4]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SWE, I was doing this "uint:8"] * 1260 because I had 1260 8-bit chunks. Relating that to here, you are skipping 2 bits and then reading three uint:10 at a time. So in total, you are reading 32-bits at a time in this while loop? Is that right?

Why do you take the last 4 and not the others? Is there reason for that?

Copy link
Contributor Author

@bryan-harter bryan-harter Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're correct, 32-bits are being read at a time!

The last 4 numbers in the array are removed. I should leave a comment explaining it in the code, but basically it sounds like those last 4 numbers are not very useful. Sometimes they have data in them and sometimes they don't, but overall the instrument engineer said those last 4 numbers is not where the interesting data is. It's possible that we could add code in the future to consider those values (if they exist), but the sample code given to me just has them removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I had it wrong. You are taking everything except last four data. Can you add that in there? I think it will be helpful.

w = bitstring.ConstBitStream(bin=waveform_raw)
ints = []
while w.pos < len(w):
w.read("pad:8") # skip 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean # skip 8-bits here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

ints += w.readlist(["uint:12"] * 2)
return ints

def _calc_low_sample_resolution(self, num_samples: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc string for these two as well? It's been so helpful to follow!

Here, we are creating num_samples array where each element is equally spaced value between 0 to num_samples. Then we apply the formula, self.LOW_SAMPLE_FREQUENCY * time_low_sr_init - self.hstriggertime to each element. Is that right?

Do you know why we multiply by frequency and subtract by trigger time? I am asking out of curiosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I have no idea why these times are calculated like this, that's just how Ethan was calculating these times in his initial analysis code. I'm going to ask him about it tomorrow when we tag up!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.


def process(self):
"""
To be called after all packets for the IDEX event have been parsed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To be called after all packets for the IDEX event have been parsed
To be called after all packets for the IDEX event have been parsed.

"Time_Low_SR": time_low_sr_xr,
"Time_High_SR": time_high_sr_xr,
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work!!!

Copy link
Contributor

@maxinelasp maxinelasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of sneaking in a late review now, but it all looks good to me!

Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It still looks good to me!

@bryan-harter bryan-harter merged commit e3a093c into IMAP-Science-Operations-Center:dev Sep 7, 2023
@bryan-harter bryan-harter deleted the idex_decom branch September 7, 2023 22:46
laspsandoval pushed a commit to laspsandoval/imap_processing that referenced this pull request Nov 15, 2023
* Init commit of IDEX packet parser

* Committing test data

* Adding xarray as a dependency

* Responding to code review

* Fixing the dependencies

* updating poetry.lock to sync with main branch

* Getting rid of old import statement

* Updating poetry lock file

* Fixing the lock file, for some reason bitstring is
still causing issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating to latest version of space packet parser

* Changing packet parser to use individual Dust
Impact classes and output an xarray object

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changing the requirements in poetry for xarray

* Responding to code review comments, adding a unit
test to compare actual data

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating poetry.lock

* Logging an error if a new science type is received

* Determining High and Low sample now
(rather than placeholder)

* Add more comments and make the variables clearer

* Fixing some of the variable names, adding comments

* Bug fix, wasn't returning right thing in new func

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Level 1 processing code for IDEX Create XTCE definition for IDEX
8 participants