Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Science Direct Event Decompression Re-write #397

Merged
merged 14 commits into from
Apr 16, 2024

Conversation

sdhoyt
Copy link
Contributor

@sdhoyt sdhoyt commented Apr 11, 2024

Change Summary

Overview

This PR re-writes the Lo Science Direct Event Decompression code based on recent discussions with and updated documents from Lo Flight Software. This PR also removes the use of the bitstring library and creates fake data in the unit tests so they can be run without sample data.

New Dependencies

None

New Files

  • imap_processing/lo/l0/utils/binary_string.py
    • simple class to iterate through a binary string
  • imap_processing/lo/l0/utils/set_dataclass_attr.py
    • sets object attributes to matching space packet parser output fields

Deleted Files

  • imap_processing/lo/l0/lo_base.py
    • This parent class was removed and put into the set_dataclass_attr.py function

Updated Files

  • imap_processing/lo/l0/data_classes/science_direct_events.py
    • Updated decompression algorithm based on updated document and conversations with Lo FSW
    • Added MODE field
  • imap_processing/lo/l0/decompression_tables/decompression_tables.py
    • Updated tables based on updated documents

Testing

  • Re-wrote imap_processing/tests/lo/test_science_direct_events.py unit tests
  • Added imap_processing/tests/lo/test_binary_string.py to test the BinaryString class

@sdhoyt sdhoyt added Ins: Lo Related to the IMAP-Lo instrument Level: L0 Level 0 processing labels Apr 11, 2024
@sdhoyt sdhoyt requested review from joycecj, KenWFairchild and a team April 11, 2024 21:15
@sdhoyt sdhoyt self-assigned this Apr 11, 2024
@sdhoyt sdhoyt requested review from bourque, greglucas, subagonsouth, tech3371, vmartinez-cu, laspsandoval and maxinelasp and removed request for a team April 11, 2024 21:15
self.software_version = software_version
self.packet_file_name = packet_file_name
self.ccsds_header = CcsdsData(packet.header)
# TODO: Is there a better way to initialize these arrays?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you initialize this to an array of fill values right away. This might also simplify some of your if/append logic down below to not have to set this every time and only update a specific position in the array if it is present.

np.ones(self.COUNT) * GlobalConstants.DOUBLE_FILLVAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good idea!


def __init__(self, packet, software_version: str, packet_file_name: str):
"""Initialize Science Direct Events Data class."""
set_attributes(self, packet)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you bring this function into this class?

    self.set_attributes(packet)

def _set_attributes(self, packet):
    # inline in this file instead of elsewhere in another module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this an outside function because I am going to re-use it for every data class. Do you think it would still be better within the class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I personally think so. One other suggestion then would be to inherit your dataclasses and set this function in the base of the inheritance.

class BaseLoClass:
    def _set_attributes(self, packet):
        ...

@dataclass
class ScienceDE(BaseLoClass):
    # Should have your method here

class ScienceMessageClass(BaseLoClass):
    # Another class with the same method inherited

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that what it's doing prior to this PR. LoBase is currently a parent class with that function in it and the direct events data class is inheriting from it. I removed that in this PR because during a discussion about the ENA direct event data classes, I got convinced that the LoBase class isn't a true parent class of the data classes (and having that as a parent class would complicate things if ENAs are able to create parent classes for our direct events), so I created the outside function instead.

I'm not crazy about importing that either, so I might revert back to using the LoBase and change that in the future if we start using a DE parent class across ENAs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 Sorry I missed all of these discussions! I personally think that is a good decision, and re-evaluating it later seems reasonable. Funny that you had what I suggested earlier and I just didn't realize it :)

# determine which case number we are working with.
# The case number is used to determine how to
# decompress the TOF values.
case_number = int(data.next_bits(4), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to do loop indexing into the data field and keep track of the pointer directly here/passed in somehow? I think data[start:start + 4] is a bit easier to follow than having to create a separate class to keep track of the position, but that is just a personal preference and can be ignored if this is easier/makes more sense to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll remove that class. When I wrote the class I was iterating over the bit string across a few different methods which made me feel like it was easier to keep track with the class. I didn't rethink that after I simplified things more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I changed my mind about this class. I'd rather avoid incrementing the bit position so many times inside the data class. I think it's a little distracting from the purpose of the method it's happening in. I should be able to reuse the class for a bunch of other Lo classes and possible HIT as well. I'll think about it a little more though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that is fine. It is somewhat just a personal preference I think.

Another thing I just thought about though is whether you should bring the for _ in range(self.COUNT): self._parse_data(data) inside this function directly and iterate here (only calling it once in the initializer then)?

        index = 0
        for i in range(self.COUNT):
            self.TIME[i] = int(data.next_bits(DATA_BITS.TIME), 2)
            # Or keeping track of index without the class
            self.TIME[i] = int(data[index:index + DATA_BITS.TIME), 2)
            index += DATA_BITS.TIME
            ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. I'll move everything in _parse_data into _decompress_data and remove _parse_data entirely.

Comment on lines 184 to 185
# transmitted for this case. Then grab the bits from
# the binary and perform a bit shift to the left. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitshift description is a little confusing to me currently. I think you're bitshifting the integer representation of those bits, where initially I read this as shifting the binary data you're reading one to the left at first.

Suggested change
# transmitted for this case. Then grab the bits from
# the binary and perform a bit shift to the left. The
# transmitted for this case. Then grab the bits from
# the binary, turn these into an integer, and perform a bit shift to the left on that integer value. The

Comment on lines 189 to 191
if case_decoder.TOF0:
tof0 = int(data.next_bits(DATA_BITS.TOF0), 2) << DE_BIT_SHIFT
self.TOF0 = np.append(self.TOF0, tof0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take the previous suggestion of creating the array initially with fill values, where index would need to be passed in for your current case.

Suggested change
if case_decoder.TOF0:
tof0 = int(data.next_bits(DATA_BITS.TOF0), 2) << DE_BIT_SHIFT
self.TOF0 = np.append(self.TOF0, tof0)
if case_decoder.TOF0:
self.TOF0[index] = int(data.next_bits(DATA_BITS.TOF0), 2) << DE_BIT_SHIFT

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just say my preference to avoid so many modules again, but feel free to ignore this comment.

I liked keeping this as lo/l0/decompression_tables.py without the extra nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I nested this because the direct events decompression is different than the decompression for science counts and star sensor packets, so the decompression_tables folder will have multiple files in it. In the Star Sensor / Science Counts PR I have three CSVs in there for that decompression algorithm. I don't have a strong preference on how to organize this though, so if you think those CSVs should be in a nested folder and this decompression_tables.py file should be in l0, I can make that change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not seeing the other PRs that came in!

I would put all of those into a single file if that makes sense.
decompression_tables.py that has all of those various ScienceCountsDecompression, DirectEventsDecompression, ...

The data files could be in l0/data or just in l0 directly, which it looks like SWE is doing: https://github.com/IMAP-Science-Operations-Center/imap_processing/tree/dev/imap_processing/swe/l1b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about potentially eliminating this altogether, but if you keep it my preference would be for this to be l0/utils.py and this be a class within that utils.py file rather than nested further.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this into the class that uses above directly rather than the indirection to another module.

@@ -82,6 +82,7 @@ def test_houskeeping():
]

for pkt_idx, packet in enumerate(packets):
print(packet)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

"POS": [],
"CKSM": [],
}
absent = "1010" # case 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice way to do this testing :) Well done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!

Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments and questions. Looks great!

# must be shifted by 1 bit to the left
DE_BIT_SHIFT = 1

DataFields = namedtuple(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add note here that explanation of how these are used is explained in detailed in science_direct_events.py

(12, 1): TOFFields(False, False, True, False, False, True),
(12, 0): TOFFields(False, False, True, True, False, False),
(13, 0): TOFFields(False, False, True, False, False, False),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way of organizing these different scenarios!


Bit Shifting:
TOF0, TOF1, TOF2, TOF3, and CKSM all must be shifted by one bit to the
left. All other fields do not need to be bit shifted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice documentation!


Methods
-------
__init__(packet, software_vesion, packet_file_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a docstring about packet parameter? I think it's a list of decom packets, is that right?

Copy link
Contributor Author

@sdhoyt sdhoyt Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Yes, I'll add that to the __init__ docstring


def _decompress_data(self):
"""Decompress the Lo Science Direct Events data."""
data = BinaryString(self.DATA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stupid question. Where is this self.DATA and self.COUNT initialized? I feel like I am missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a stupid question! Right now it's being initialized in the set_attributes function which gets passed self and the packet. I made that update in this PR, but I'm going to go back to having a parent class LoBase that has a method that works the same as set_attributes. So it will work the same, but instead of calling set_attributes it will call that as a parent method.

# decompress the TOF values.
case_number = int(data.next_bits(4), 2)

# time, energy, and mode are always transmitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comments!

next_four_bits = data.next_bits(4)

assert first_three_bits == "000"
assert next_four_bits == "1110"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

"POS": [],
"CKSM": [],
}
absent = "1010" # case 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and all can be addressed in future PRs/restructuring if desired.

# cases, so these can be initialized to the
# CDF fill val and stored with this value for
# those cases.
self.TIME = np.ones(self.COUNT) * GlobalConstants.DOUBLE_FILLVAL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be a specific datatype like "uint8" rather than "float"?

As an aside and ignored for this PR: Another thought I had after looking at some of this repetition is if we should think about Structured/named Arrays instead of dataclasses?
https://numpy.org/doc/stable/user/basics.rec.html#
So then everything would be in a single numpy array, but lookup via the name. self.data["energy"] for example.

self.POS = np.ones(self.COUNT) * GlobalConstants.DOUBLE_FILLVAL
self._decompress_data()

def _decompress_data(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like that this is more consolidated now and I think it flows nicely with the indexing to set the variables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a class it seems out of place in the utils area to me. What about putting these things into larger files to avoid the imports and needing to figure out where things are coming from?

lo/l0/packets.py (or whatever name would make sense to hold all of your packet class definitions
which would contain:

class LoBase:
    ...

class ScienceDirectEvent(LoBase):
    ...

class OtherPacketClasses(LoBase):
    ...

Compared to right now I think each of those classes is in a dedicated file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely have a preference for smaller, more specific files. I'm ok with grouping things together in the same file if that's the style we're aiming for. However, for this in particular, I'm going to have at least 12 data classes, many of which have a lot more attributes than the direct events class, so that file would get very large and I think it would be a pain to find things in it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Sounds good, I think most other instruments have been going the more files route on this project than I'm used to, so I think this does match up with the rest of the code base and is consistent.

I still would have a slight preference for moving this class/file that serves as the base for inheritance to be moved into the same folder as the other classes. But, I'm not blocking on that and this seems good as-is.

@sdhoyt sdhoyt merged commit 8a8278c into IMAP-Science-Operations-Center:dev Apr 16, 2024
15 checks passed
@sdhoyt sdhoyt deleted the sci-de-redo branch April 16, 2024 18:20
@sdhoyt sdhoyt linked an issue Apr 19, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ins: Lo Related to the IMAP-Lo instrument Level: L0 Level 0 processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Bitstring from Lo Sci DE data class
3 participants