Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parse_direct_events to remove for loop #1220

Conversation

subagonsouth
Copy link
Contributor

Add test coverage for parse_direct_events

Change Summary

This PR changes how the direct events are parsed from the binary blob in Hi SCI_DE packets. It interprets the bytes data as 16-bit unsigned integers and then reshapes them into a 3xN ndarray. This is done to take advantage of numpy broadcasting. Bit-shifting and bitwise_and with bitmasks are used to extract the unsigned integer values from the 16-bit integers.

Updated Files

  • imap_processing/hi/l1a/science_direct_event.py
    • Interpret bytes as uint16 array and use numpy bit-shifts and bitwise_and to extract list of direct events.
  • imap_processing/tests/hi/test_science_direct_event.py
    • Add specific test coverage for the parse_direct_events function.

Closes: #1217

@subagonsouth subagonsouth added Ins: Hi Related to the IMAP-Hi instrument Level: L0 Level 0 processing labels Dec 13, 2024
@subagonsouth subagonsouth requested review from nkerman and a team December 13, 2024 19:36
@subagonsouth subagonsouth self-assigned this Dec 13, 2024
@subagonsouth subagonsouth requested review from bourque, sdhoyt and tech3371 and removed request for a team December 13, 2024 19:36
de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
data_uint16[2] >> (2 + 8)
)
de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice new way! Nice refactor. It makes sense to me on high level but I will definitely ask Nat to approve it to get one more person to double check this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkerman or @lacoak21 Can you review this? I think guys are know much more about numpy or bit shifting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in progress right now! Sorry it took a bit to get to!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This makes sense to me based on the docstring
trigger_id - first word gets shifted right 14 bits (discarding them) leaving 2 bits
de_tag - gets the rest of the bits from the first the first word that are not the trigger id and the first 2 bits of word 2 (16 bits)
tof-1 - uses a mask to skip the first two bits of the second word and keep the middle 10 bits - then shifts right to clear the bottom 4 zeroed bits. (10bits)
tof-2 - keeps last 4 bits and shifts left to clear the first 12 then gets the first 6 bits of the third word (10 bits)
tof-3 - keeps last 10 bits of the third word (10 bits)

Had to write that all out to keep track 😁

).astype(np.uint16)

de_dict = dict()
de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in above line, you were ordering data by column. Does this mean that, data_uint16 would look something like this?

de_data = 'word_0word_1word_2word_3word_4word5'
data_uint16 = [
    [word_0, word_3],
    [word_1, word_4],
    [word_2, word_5],
]

Now by getting first row, you are able to unpack trigger id and so on? Each row containing some information about DE. This is me talking out my understanding out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you have it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the 'order="F"' column ordering, that each column represents a 6-byte DE. So they don't come in pairs but instead, the number of columns is the number of DEs.

Copy link
Contributor

@nkerman nkerman Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer, I'm wondering if my confusion came from the use of the word... "word" 😁 , being used to describe individual bytes, while the comment describes:

"Each DE consists of 6-bytes. Considering the data as 3 2-byte words,"

Is the following correct?:

For each DE, we only read 3x 2-byte words as values into the data_uint16 array.

# hex buffer data of 6 bytes:
de_data = b"\x00\x00\x00\x01\x00\x02"

# read these as 3x, 2-byte words into array
data_uint16 => [
    0, # \x00\x00 => "0b00000000"
    1, # \x00\x01 => "0b00000001"
    258, # \x01\x02 = 16**2 +2  => "0b10000010"
]

Then we get whatever bits we need from these 3 "words".


Note: I now realize that this example was for a 1 DE input string, and Tenzin's example was a 2 DE case, with 6 words!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like everyone is saying almost the same thing, but let me try to clarify. I can never remember if numpy does rows or columns first, so I will avoid those terms and instead use first and second axis. I will see if I can clean up the comments in the code too.

The de_data is a binary blob with Nx6 bytes of data where N = number of direct events encoded into the binary blob. Interpreting the data as big-endian uint16 data and reshaping into a (3, -1) ndarray results in an array with shape (3, N). Indexing the first axis of that array (e.g. data_uint16[i]) gives the ith 2-bytes of data for each of the N direct events. In this way, indexing the first axis and applying bitwise operators to the resulting N-element uint16 array allows extraction of the encoded DE data without looping.

Does that help any?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The word_3 comment we talked about, combined with Tenzin using a 2-DE example threw me off, but this all makes sense to me now.

Comment on lines 77 to 82
# Each DE consists of 6-bytes. Considering the data as 3 2-byte words,
# each word contains the following:
# word_0: 2-bits of Trigger ID, upper 14-bits of de_tag
# word_1: lower 2-bits of de_tag, 10-bits tof_1, upper 4-bits of tof_2
# word_3: lower 6-bits of tof_2, 10-bits of tof_3
# Interpret binary blob as uint16 array and reshape to (3, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why it's 0,1,3 here? It seems intentional and to denote some conceptual distinction, but I can't make sense of why it's not 0,1,2 or 1,2,3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. As implemented below, it should be word_2 not word_3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Everything makes sense to me now!

).astype(np.uint16)

de_dict = dict()
de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.

de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
data_uint16[2] >> (2 + 8)
)
de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in progress right now! Sorry it took a bit to get to!

Copy link
Contributor

@nkerman nkerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It took me a while (and a few stream of conscious comments) to work out the mechanism here, but I think it all works for efficiently parsing arbitrary numbers of DEs.

The only thing I'm still confused by is why you count words as 0,1,3.

de_dict["de_tag"] = (data_uint16[0] << 2) + (data_uint16[1] >> (6 + 8))
de_dict["tof_1"] = (data_uint16[1] & int(b"00111111_11110000", 2)) >> 4
de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
data_uint16[2] >> (2 + 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the bitwise ops look good!

Why not hardcode in 10 instead of (2+8) and 14 instead of (6+8), given that the comment specifies that the skipped bits are for single, 10 bit (tof_3) and 14 bit (de_tag) values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point. That is an artifact from how I first implemented it.

@subagonsouth subagonsouth merged commit 2781cee into IMAP-Science-Operations-Center:dev Dec 20, 2024
17 checks passed
@subagonsouth subagonsouth deleted the 1217-hi-l1a-refactor-parsing-of-de_tof-binary-blob branch December 20, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ins: Hi Related to the IMAP-Hi instrument Level: L0 Level 0 processing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

HI L1A - Refactor parsing of DE_TOF binary blob
4 participants