-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init commit of IDEX packet parser #79
Init commit of IDEX packet parser #79
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this overall looks great! I didn't test it on my end, but the parsing looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I need to get "black" and the other linting stuff running so the formatting looks nicer, right now it's kind of a mess |
from imap_processing import packet_definition_directory | ||
|
||
|
||
class IDEXPacketParser(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a docstring here explaining what this class does, what attributes it has, what methods.
Right now, everything happens within instantiation which seems like we don't really need a class, but possibly just a function that returns the data you want? Perhaps there is a separate class IDEXEvent
that stores event information, or IDEXWaveform
for the waveform data, where you'd be creating these dataclasses as you process a packet. So, the actual processing of a file would be a function call like in SWE:
def decom_packets(packet_file: str): |
def decom_idex(...):
list_of_idex_events = []
for pkt in idex_packet_generator:
# instantiate your event classes and fill them in as needed
evt = IDEXEvent(pkt)
list_of_idex_events.append(evt)
return list_of_idex_events
We should decide how we want to architect these for consistency across instruments though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah everything interesting basically happens inside init() right now. I actually had several other methods here before which may or may not make it worth it to keep it as a class
- plot_waveforms: A function to quickly plot the data in the self.data structure in matplotlib as a quicklook to verify accuracy (and this might even be extended into a quicklook product in the future? Or maybe we want to quicklook directly from the L1a)
- create_l1a_cdf: the method called to populate data into an xarray object, and then convert it to cdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add high level explanation of what this class is doing? It would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your waveforms data associated with whatever packet file came in, or with an individual event/packet? I guess I'm getting at what level of granularity do we want for the class.
I think you could put those methods you describe onto the IDEXEvent
class, where it would be plotting a waveform from each event, storing what type of science data it was in this event, etc...? I haven't looked super close, but that is just what my mind goes to. Or even separate functions within the module, idex.plot_waveforms(pass_in_data)
We are going to get dumps from the POC every 3 days, which means a packet file may be 3-days long I think? Or partial days, etc... I'm not sure we want our CDF creation lined up with those semi-arbitrary boundaries, but rather we could have a separate create_l1a_cdf(list_of_packets)
function that takes in all of the events you want going into your l1a CDF and not necessarily tied to this current class.
I won't block on this, but I do think it is important to think about because you're the first and sort of setting up the template for others to follow. It looks like others like this large class though, so maybe I'm the odd one out 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a docstring is incoming! I should have included that first thing, sorry.
As far as the structure of the class, to my knowledge the L0 data we get from the MOC/POC will be sliced up grouped up into packets that span a single day, midnight to midnight UTC. So if there are 3 days between contacts, we should receive 3 separate L0 files. The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.
The waveforms are associated with a single dust event, so perhaps the class could be sub-divided into specific events rather than an entire L0 file (for everyone's reference, a single "event" in this context is a dust impact, which may occur roughly ~1-10 times a day). My concern with only looking at the files by a single event is:
- I don't know if the packets within the L0 file are required to be grouped by event, so we may end up populating a bunch of classes at once. Right now the packets are all time-ordered, but their algorithm document specifies that technically we should be looking at the AID of each packet to ensure we're filling in data to the right event.
- I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file
- If we're hoping to keep the structure of the decom consistent across all of the instruments, IDEX would look like the oddball of the bunch if it grouped data by event rather than by a time range. I believe none of the other instruments are event based, and just continuously take data.
If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CDF files that we generate should also span a single day, so there should be a 1-to-1 correspondence from a packet file to a l1a CDF file.
I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.
I think it's helpful to group all of the events together like this because this collection of events is ultimately what will make the l1a CDF file
Fair enough, I think I was viewing Events as the critical assembly unit and CDF files being something we put together, where the event is the logical structure.
If we wanted, we could make DustEvent classes that this decom class uses to populate, but that might be a little overkill. Overall, the data we need to get out of the packet is pretty simple (just 6 1D arrays).
I think I'm just not being very clear with what is in my mind because I actually think it would be easier to parse (IMO) what is going on here rather than going to nested dictionaries with appended lists to try and get at the events (i.e. those 6 1D arrays would be stored in the DustEvent
class. The datastore would then just work on this list of daily events.
daily_events = []
for packet in packet_dump:
# DustEvent would be able to process any event and handle the logic for what science type it is
# how to parse waveforms, etc...
event = DustEvent(packet)
daily_events.append(event)
# How many events?
len(daily_events)
# Filter by events of a certain type
len([event for event in daily_events if event.science_type == "COOL_SCIENCE_MODE"])
# Now further down if you want to make a CDF file with daily_events:
create_idex_cdf_l1a(daily_events)
# if you want you could have another class just to store groupings of events too
class DailyDustEvents:
def __init__(events):
self.events = events
def create_l1a(self):
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll experiment with the classes a little more, it probably will make it more readable to do it like you have with DailyDustEvents and DustEvent.
As for
I'm not sure we can count on this... I think we can hope for it, but I would think that if a contact comes down at noon and they get data up until noon I'd expect them to deliver the files through noon and not wait until 3 days from then during the next contact to send down a complete days worth of data. That is a good question for how we want things to happen because I think we can somewhat dictate this.
That does happen frequently on MAVEN and EMM, but then the next downlink we receive "v02" of the pkts file and processing starts anew. Typically that means that v01-r00 of the file might be incomplete, but the next revision probably contains the full amount of data. I don't know what we want to do on this mission, but thats basically just what I was assuming would happen.
self.data = datastore | ||
self.numevents = evtnum | ||
|
||
def _log_packet_info(self, evtnum, pkt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use a logger here instead of prints, and add DEBUG log level if they aren't necessarily needed but helpful when we want to debug things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
def _log_packet_info(self, evtnum, pkt): | ||
print(f"^*****Event header {evtnum}******^") | ||
# Extract the 17-22-bit integer (usually 8) | ||
self.lspretrigblocks = (pkt.data['IDX__TXHDRBLOCKS'].derived_value >> 16) & 0b1111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more than logging at this point by assigning to attributes. Do we need these attributes added to the class, or can they just be local variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I think it's possible they could be added at some point, but I haven't figured out if they actually want that stuff in the CDF. My guess is they won't want it, so I'll remove the "self" stuff
print("LS pre trig sampling blocks: ", self.lspretrigblocks) | ||
print("HS post trig sampling blocks: ", self.hsposttrigblocks) | ||
print("LS post trig sampling blocks: ", self.lsposttrigblocks) | ||
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.TOFdelay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding | |
self.tof_delay = pkt.data['IDX__TXHDRSAMPDELAY'].raw_value >> 2 # First two bits are padding |
We should try and stick to python variable naming conventions when possible.
compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data | ||
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)") | ||
|
||
def _parse_hs_waveform(self, waveform_raw: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't accessing self
at all, so this could be a static method on the class, but again this seems like a helper function within this module so isn't needed on the class explicitly.
pyproject.toml
Outdated
@@ -30,6 +30,8 @@ classifiers = [ | |||
[tool.poetry.dependencies] | |||
python = ">=3.9,<3.12" | |||
space_packet_parser = ">=4.0.2,<5" | |||
bitstring = '<=4.0.1' | |||
xarray = '>=2022.11.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are using it yet. I realize we will likely later, but I think we should wait to add dependencies until necessary.
pyproject.toml
Outdated
@@ -30,6 +30,8 @@ classifiers = [ | |||
[tool.poetry.dependencies] | |||
python = ">=3.9,<3.12" | |||
space_packet_parser = ">=4.0.2,<5" | |||
bitstring = '<=4.0.1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bitstring
works with any version of your code, it is the combination of bitstring + space_packet_parser
that isn't working. I'm not sure how to explicitly call that out here though... Just indicating that I don't think it is ideal to pin bitstring
back when it is really space_packet_parser
that is causing the failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I don't know what else to do though. If you do a regular poetry install
without specifying bitstring version here, the code doesn't work. Gavin is working on a fix, we could just wait for that before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gavin already pushed a new version to pypi!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it locally and it seems to work! First algorithm implementation. Nice work!
from imap_processing import packet_definition_directory | ||
|
||
|
||
class IDEXPacketParser(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add high level explanation of what this class is doing? It would be helpful.
self.data[scitype][evtnum] += pkt.data['IDX__SCI0RAW'].raw_value | ||
|
||
# Parse the waveforms according to the scitype present (high gain and low gain channels encode waveform data differently). | ||
self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to add this in _parse_waveform_data
function doc string because I was wondering what those (2, 4, 8) means.
for pkt in idex_packet_generator: | ||
if 'IDX__SCI0TYPE' in pkt.data: | ||
scitype = pkt.data['IDX__SCI0TYPE'].raw_value | ||
if scitype == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does scitype==1 means in concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are different types of science packets we could expect, numbered [1, 2, 4, 8, 16, 32, 64].
"1" contains information about the events, and 2,3,8,16,32 and 64 contain the different 6 variables we're interested in. Maybe we can change these numbers to be more human-readable here.
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file) | ||
idex_packet_generator = idex_parser.generator(idex_binary_data) | ||
|
||
self.epochs = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering what is difference between epochs
and data
? This might relate to question I had about scitype == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"epochs" is a list of times that the dust events have occurred, and the data is the actual data from those events. epoch is generally just the CDF/SPDFs term for "time".
self.scitype_to_names = {2: "TOF_High", 4: "TOF_Low", 8: "TOF_Mid", 16: "Target_Low", | ||
32: "Target_High", 64: "Ion_Grid"} | ||
datastore = {} | ||
self.time_low_sr = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we spell out time_low_sr
? more specifically, spell out sr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, "sr" just stands for "sampling rate". 3 of the important variables use a very high sampling rate, and the 3 of the other ones use a (comparitavely) very low sampling rate.
self.time_high_sr = [] | ||
for scitype in self.data: | ||
datastore[self.scitype_to_names[scitype]] = [] | ||
for evt in self.data[scitype]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does evt stands for event
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. This gives me ideas for what to do with mine. Try to get sample output data associated with packets to confirm output is as expected.
|
||
evtnum = 0 | ||
for pkt in idex_packet_generator: | ||
if 'IDX__SCI0TYPE' in pkt.data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a comment to this specific PR, but what are the chances of packet field names changing? If there's a possibility that they could change down the line would it make sense to create some type of adapter so we wouldn't need to go through and change these strings? Or if they did change would we not care since we can call them whatever we want in the xtce files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm certainly not an expert at decom or XTCE, but I think we're just assigning a human-readable name to those specific 8 bits in the file. I went through and changed the code and the packet definition xml file to call it 'IDX__SCI0TYPE_X' instead of "IDX__SCI0TYPE", and everything still decoded fine.
print(f"Mid gain delay = {self.mgdelay} samples.") | ||
print(f"Low gain delay = {self.lgdelay} samples.") | ||
if(pkt.data['IDX__TXHDRLSTRIGMODE'].derived_value!=0): # If this was a LS (Target Low Gain) trigger | ||
self.Triggerorigin = 'LS' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will these Trigger variables be used for? In the case that some of these if
conditions are not satisfied, they might not be defined when something tries to access them from the object, so should be initialized under init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah these are only used for logging really, so I don't know if it matters that they are like this. I'm going to get rid of the "self." because I don't think the rest of the class needs to know or care about them.
scitype = pkt.data['IDX__SCI0TYPE'].raw_value | ||
if scitype == 1: | ||
evtnum += 1 | ||
self.epochs.append(pkt.data['SHCOARSE'].derived_value + 20*(10**(-6))*pkt.data['SHFINE'].derived_value) # Use this as the CDF epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to add a comment or assign those numbers (20, 10, -6) to variables, so it's clear where they're coming from.
Great job! This will be a good reference for my decom as well |
self.data = datastore | ||
self.numevents = evtnum | ||
|
||
def _log_packet_info(self, evtnum, pkt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being nitpicky here, but could we use event_num
and packet
here instead of evtnum
and pkt
? I find that code is less readable when variables are unnecessarily abbreviated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for sure
|
||
def _parse_hs_waveform(self, waveform_raw: str): | ||
"""Parse a binary string representing a high gain waveform""" | ||
w = bitstring.ConstBitStream(bin=waveform_raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w = bitstring.ConstBitStream(bin=waveform_raw) | |
waveform = bitstring.ConstBitStream(bin=waveform_raw) |
ints += w.readlist(['uint:10']*3) | ||
return ints | ||
|
||
def _parse_ls_waveform(self, waveform_raw: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what ls
stands for here
compressed = bool(pkt.data['IDX__SCI0COMP'].raw_value) # If we need to decompress the data | ||
print(f"Timestamp = {self.epochs[evtnum-1]} seconds since epoch (Midnight January 1st, 2012)") | ||
|
||
def _parse_hs_waveform(self, waveform_raw: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what hs
stands for here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high sampling, I'll spell it out in the next commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I just had a few nitpicky suggestion, but otherwise it looks good to me!
I think I just responded to everyone's comments, with the exception of Greg's comments about perhaps using classes to encapsulate a dust event rather than this nested dictionary structure. I'm still going to play around with that suggestion today. |
still causing issues
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Impact classes and output an xarray object
pre-commit.ci autofix |
I just made an update so that the data structure uses classes for individual dust events as Greg suggested, I kinda like it since it seems more intuitive to me. Plus it now outputs an xarray data object which (spoiler alert) will be used in my next commit. But it's pretty unlike the other instruments since the rest won't rely on individual events. If ya'll decide the other way way better though, I'm happy to revert to 055e335 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, I like this better personally! I've got quite a few nits and suggestions, but nothing major. The one thing I'd suggest is continuing to go down this route and pass packet
into your DustEvent
constructor and methods and let DustEvent
figure out what needs to happen with the packet. So, your PacketParser
class has as little IDEX logic in it as possible and rather defers most of that to the events themselves.
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file) | ||
idex_packet_generator = idex_parser.generator(idex_binary_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file) | |
idex_packet_generator = idex_parser.generator(idex_binary_data) | |
with open(packet_file, "rb") as binary_data: | |
packet_generator = parser.generator(binary_data) |
Following the examples from space_packet_parser: https://space-packet-parser.readthedocs.io/en/latest/users.html#basic-usage
where this will cleanup the files for you automatically.
|
||
|
||
class IDEXPacketParser: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow numpydoc conventions according to the style guide: https://imap-processing.readthedocs.io/en/latest/development/style-guide/api-documentation.html
Parameters
------------
There is also no filename parameter, and I think parameters can be documented within the constructor below and put attributes here.
* IDEX_Trigger is a placeholder variable for now | ||
""" | ||
|
||
idex_xtce = f"{packet_definition_directory}/idex_packet_definition.xml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are within IDEX already throughout this module, so I'd leave it as xtce_file
or something like that without the idex_ prefix everywhere.
idex_binary_data = bitstring.ConstBitStream(filename=idex_packet_file) | ||
idex_packet_generator = idex_parser.generator(idex_binary_data) | ||
|
||
self.coords = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.coords = {} |
not used anywhere?
idex_packet_generator = idex_parser.generator(idex_binary_data) | ||
|
||
self.coords = {} | ||
self.data = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.data = {} |
Just set it at the end like you're already doing and then the types won't get conflicted either since it is an xarray dataset later on.
processed_dust_impact_list = [] | ||
for event_number in self.dust_events: | ||
processed_dust_impact_list.append(self.dust_events[event_number].process()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one-liner?
processed_dust_impact_list = [] | |
for event_number in self.dust_events: | |
processed_dust_impact_list.append(self.dust_events[event_number].process()) | |
processed_dust_impact_list = [dust_event.process() for dust_event in self.dust_events.values()] |
packet, nothing here should affect the data | ||
""" | ||
event_num = pkt.data["IDX__SCI0EVTNUM"].derived_value | ||
logging.info(f"^*****Event header {event_num}******^") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want these logs, or do we only want them while debugging?
logging.debug()
elif scitype == 32: | ||
self.Target_High_bits += bits | ||
elif scitype == 64: | ||
self.Ion_Grid_bits += bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.Ion_Grid_bits += bits | |
self.Ion_Grid_bits += bits | |
else: | |
logging.warning("Unknown science type received: [%s]", scitype) |
Add some warnings/raises if we are getting into the edge cases?
pyproject.toml
Outdated
@@ -29,7 +29,8 @@ classifiers = [ | |||
|
|||
[tool.poetry.dependencies] | |||
python = ">=3.9,<3.12" | |||
space_packet_parser = ">=4.0.2,<5" | |||
xarray = '>=0.0.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just remove the version dependency if you're going to put it at 0s?
or some more recent xarray is likely to be required.
xarray = '>=0.0.0' | |
xarray = '>=2023.0.0' |
) | ||
|
||
|
||
class IDEXRawDustEvent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we feel about naming classes. I guess I'd vote for dropping IDEX personally. So that if I were to import it later it'd be: from imap_processing.idex import RawDustEvent
and not duplicating IDEX. But, I don't have a strong preference.
test to compare actual data
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Responded to more comments, also added a unit test that actually tests a small amount of the data |
(rather than placeholder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me! I just requested couple places where we could use comments but it looks great!
header_packet.data["IDX__TXHDRSAMPDELAY"].raw_value >> 2 | ||
) # First two bits are padding | ||
mask = 0b1111111111 | ||
hgdelay = (tof_delay >> 20) & mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is hgdelay? can we spell out hg
?
) # First two bits are padding | ||
mask = 0b1111111111 | ||
hgdelay = (tof_delay >> 20) & mask | ||
lspretrigblocks = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comment about why those bits are shifted? It helps to give a high level context to what is happening.
logging.debug(f"^*****Event header {event_num}******^") | ||
logging.debug( | ||
f"Timestamp = {self.impact_time} seconds since epoch \ | ||
(Midnight January 1st, 2012)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this IMAP specific epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, it's just the epoch that they use to keep track time of in their header packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see
packet.data["IDX__TXHDRSAMPDELAY"].raw_value >> 2 | ||
) # First two bits are padding | ||
mask = 0b1111111111 | ||
lgdelay = (tof_delay) & mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. hg
means high gain. and mg
means mid gain.
can we spell it out? I think it will help for readability.
def _parse_high_sample_waveform(self, waveform_raw: str): | ||
""" | ||
Parse a binary string representing a high sample waveform | ||
Data arrives in 10 bit chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know!
while w.pos < len(w): | ||
w.read("pad:2") # skip 2 | ||
ints += w.readlist(["uint:10"] * 3) | ||
return ints[:-4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SWE, I was doing this "uint:8"] * 1260
because I had 1260 8-bit chunks. Relating that to here, you are skipping 2 bits and then reading three uint:10
at a time. So in total, you are reading 32-bits at a time in this while loop? Is that right?
Why do you take the last 4 and not the others? Is there reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're correct, 32-bits are being read at a time!
The last 4 numbers in the array are removed. I should leave a comment explaining it in the code, but basically it sounds like those last 4 numbers are not very useful. Sometimes they have data in them and sometimes they don't, but overall the instrument engineer said those last 4 numbers is not where the interesting data is. It's possible that we could add code in the future to consider those values (if they exist), but the sample code given to me just has them removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I had it wrong. You are taking everything except last four data. Can you add that in there? I think it will be helpful.
w = bitstring.ConstBitStream(bin=waveform_raw) | ||
ints = [] | ||
while w.pos < len(w): | ||
w.read("pad:8") # skip 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean # skip 8-bits
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
ints += w.readlist(["uint:12"] * 2) | ||
return ints | ||
|
||
def _calc_low_sample_resolution(self, num_samples: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a doc string for these two as well? It's been so helpful to follow!
Here, we are creating num_samples
array where each element is equally spaced value between 0 to num_samples
. Then we apply the formula, self.LOW_SAMPLE_FREQUENCY * time_low_sr_init - self.hstriggertime
to each element. Is that right?
Do you know why we multiply by frequency and subtract by trigger time? I am asking out of curiosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I have no idea why these times are calculated like this, that's just how Ethan was calculating these times in his initial analysis code. I'm going to ask him about it tomorrow when we tag up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
|
||
def process(self): | ||
""" | ||
To be called after all packets for the IDEX event have been parsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be called after all packets for the IDEX event have been parsed | |
To be called after all packets for the IDEX event have been parsed. |
"Time_Low_SR": time_low_sr_xr, | ||
"Time_High_SR": time_high_sr_xr, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of sneaking in a late review now, but it all looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! It still looks good to me!
* Init commit of IDEX packet parser * Committing test data * Adding xarray as a dependency * Responding to code review * Fixing the dependencies * updating poetry.lock to sync with main branch * Getting rid of old import statement * Updating poetry lock file * Fixing the lock file, for some reason bitstring is still causing issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating to latest version of space packet parser * Changing packet parser to use individual Dust Impact classes and output an xarray object * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changing the requirements in poetry for xarray * Responding to code review comments, adding a unit test to compare actual data * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating poetry.lock * Logging an error if a new science type is received * Determining High and Low sample now (rather than placeholder) * Add more comments and make the variables clearer * Fixing some of the variable names, adding comments * Bug fix, wasn't returning right thing in new func --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Change Summary
Overview
These changes allow for IDEX L0 files to be decommed and stored in a temporary data structure.
This is done via a new class, IDEXPacketParser, which is initialized with a packet file. It creates a generator object in space_packet_parser, and loops through the file appending data to these temporary data structures.
New Dependencies
bitstring must not be the latest version currently
Adding in xarray dependency (a little pre-emptively, but it comes with numpy which I currently need)
New Files
Testing
The unit tests right now decom the packet and verify the length of things are accurate, at some point it should be checked against the data Ethan has in his his HDF5 files
If you want to test it locally, it should be as simple as calling these functions: