Refactor parse_direct_events to remove for loop #1220

subagonsouth · 2024-12-13T19:36:02Z

Add test coverage for parse_direct_events

Change Summary

This PR changes how the direct events are parsed from the binary blob in Hi SCI_DE packets. It interprets the bytes data as 16-bit unsigned integers and then reshapes them into a 3xN ndarray. This is done to take advantage of numpy broadcasting. Bit-shifting and bitwise_and with bitmasks are used to extract the unsigned integer values from the 16-bit integers.

Updated Files

imap_processing/hi/l1a/science_direct_event.py
- Interpret bytes as uint16 array and use numpy bit-shifts and bitwise_and to extract list of direct events.
imap_processing/tests/hi/test_science_direct_event.py
- Add specific test coverage for the parse_direct_events function.

Closes: #1217

Add test coverage for parse_direct_events

tech3371 · 2024-12-13T20:54:20Z

imap_processing/hi/l1a/science_direct_event.py

+    de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
+        data_uint16[2] >> (2 + 8)
+    )
+    de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2)


Nice new way! Nice refactor. It makes sense to me on high level but I will definitely ask Nat to approve it to get one more person to double check this.

@nkerman or @lacoak21 Can you review this? I think guys are know much more about numpy or bit shifting.

Yes, in progress right now! Sorry it took a bit to get to!

Cool! This makes sense to me based on the docstring
trigger_id - first word gets shifted right 14 bits (discarding them) leaving 2 bits
de_tag - gets the rest of the bits from the first the first word that are not the trigger id and the first 2 bits of word 2 (16 bits)
tof-1 - uses a mask to skip the first two bits of the second word and keep the middle 10 bits - then shifts right to clear the bottom 4 zeroed bits. (10bits)
tof-2 - keeps last 4 bits and shifts left to clear the first 12 then gets the first 6 bits of the third word (10 bits)
tof-3 - keeps last 10 bits of the third word (10 bits)

Had to write that all out to keep track 😁

imap_processing/hi/l1a/science_direct_event.py

tech3371 · 2024-12-13T21:31:33Z

imap_processing/hi/l1a/science_direct_event.py

+    ).astype(np.uint16)
+
+    de_dict = dict()
+    de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8)


in above line, you were ordering data by column. Does this mean that, data_uint16 would look something like this?

de_data = 'word_0word_1word_2word_3word_4word5' data_uint16 = [ [word_0, word_3], [word_1, word_4], [word_2, word_5], ]

Now by getting first row, you are able to unpack trigger id and so on? Each row containing some information about DE. This is me talking out my understanding out loud.

Yes, you have it!

Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.

I think based on the 'order="F"' column ordering, that each column represents a 6-byte DE. So they don't come in pairs but instead, the number of columns is the number of DEs.

Looking closer, I'm wondering if my confusion came from the use of the word... "word" 😁 , being used to describe individual bytes, while the comment describes:

"Each DE consists of 6-bytes. Considering the data as 3 2-byte words,"

Is the following correct?:

For each DE, we only read 3x 2-byte words as values into the data_uint16 array.

# hex buffer data of 6 bytes: de_data = b"\x00\x00\x00\x01\x00\x02" # read these as 3x, 2-byte words into array data_uint16 => [ 0, # \x00\x00 => "0b00000000" 1, # \x00\x01 => "0b00000001" 258, # \x01\x02 = 16**2 +2 => "0b10000010" ]

Then we get whatever bits we need from these 3 "words".

Note: I now realize that this example was for a 1 DE input string, and Tenzin's example was a 2 DE case, with 6 words!

It seems like everyone is saying almost the same thing, but let me try to clarify. I can never remember if numpy does rows or columns first, so I will avoid those terms and instead use first and second axis. I will see if I can clean up the comments in the code too.

The de_data is a binary blob with Nx6 bytes of data where N = number of direct events encoded into the binary blob. Interpreting the data as big-endian uint16 data and reshaping into a (3, -1) ndarray results in an array with shape (3, N). Indexing the first axis of that array (e.g. data_uint16[i]) gives the ith 2-bytes of data for each of the N direct events. In this way, indexing the first axis and applying bitwise operators to the resulting N-element uint16 array allows extraction of the encoded DE data without looping.

Does that help any?

Yes! The word_3 comment we talked about, combined with Tenzin using a 2-DE example threw me off, but this all makes sense to me now.

nkerman · 2024-12-19T17:48:57Z

imap_processing/hi/l1a/science_direct_event.py

+    # Each DE consists of 6-bytes. Considering the data as 3 2-byte words,
+    # each word contains the following:
+    # word_0: 2-bits of Trigger ID, upper 14-bits of de_tag
+    # word_1: lower 2-bits of de_tag, 10-bits tof_1, upper 4-bits of tof_2
+    # word_3: lower 6-bits of tof_2, 10-bits of tof_3
+    # Interpret binary blob as uint16 array and reshape to (3, n)


Can you help me understand why it's 0,1,3 here? It seems intentional and to denote some conceptual distinction, but I can't make sense of why it's not 0,1,2 or 1,2,3.

Good catch. As implemented below, it should be word_2 not word_3.

Great! Everything makes sense to me now!

nkerman · 2024-12-19T17:52:12Z

imap_processing/hi/l1a/science_direct_event.py

+    ).astype(np.uint16)
+
+    de_dict = dict()
+    de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8)


Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.

nkerman · 2024-12-19T17:54:14Z

imap_processing/hi/l1a/science_direct_event.py

+    de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
+        data_uint16[2] >> (2 + 8)
+    )
+    de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2)


Yes, in progress right now! Sorry it took a bit to get to!

nkerman

Looks good! It took me a while (and a few stream of conscious comments) to work out the mechanism here, but I think it all works for efficiently parsing arbitrary numbers of DEs.

The only thing I'm still confused by is why you count words as 0,1,3.

nkerman · 2024-12-19T18:50:38Z

imap_processing/hi/l1a/science_direct_event.py

+    de_dict["de_tag"] = (data_uint16[0] << 2) + (data_uint16[1] >> (6 + 8))
+    de_dict["tof_1"] = (data_uint16[1] & int(b"00111111_11110000", 2)) >> 4
+    de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + (
+        data_uint16[2] >> (2 + 8)


All the bitwise ops look good!

Why not hardcode in 10 instead of (2+8) and 14 instead of (6+8), given that the comment specifies that the skipped bits are for single, 10 bit (tof_3) and 14 bit (de_tag) values.

Yep, good point. That is an artifact from how I first implemented it.

subagonsouth added 2 commits December 13, 2024 10:09

Refactor parse_direct_events to remove for loop

649bc46

Add test coverage for parse_direct_events

clean up changes

25b7fa0

subagonsouth added Ins: Hi Related to the IMAP-Hi instrument Level: L0 Level 0 processing labels Dec 13, 2024

subagonsouth requested review from nkerman and a team December 13, 2024 19:36

subagonsouth self-assigned this Dec 13, 2024

subagonsouth requested review from bourque, sdhoyt and tech3371 and removed request for a team December 13, 2024 19:36

fix bad f-strings

a9b42a2

tech3371 reviewed Dec 13, 2024

View reviewed changes

nkerman reviewed Dec 19, 2024

View reviewed changes

nkerman approved these changes Dec 19, 2024

View reviewed changes

Improve comments?

2ef2092

subagonsouth merged commit 2781cee into IMAP-Science-Operations-Center:dev Dec 20, 2024
17 checks passed

subagonsouth deleted the 1217-hi-l1a-refactor-parsing-of-de_tof-binary-blob branch December 20, 2024 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor parse_direct_events to remove for loop #1220

Refactor parse_direct_events to remove for loop #1220

subagonsouth commented Dec 13, 2024

tech3371 Dec 13, 2024

tech3371 Dec 19, 2024

nkerman Dec 19, 2024

lacoak21 Dec 19, 2024

tech3371 Dec 13, 2024

subagonsouth Dec 13, 2024

nkerman Dec 19, 2024

lacoak21 Dec 19, 2024

nkerman Dec 19, 2024 •

edited

Loading

subagonsouth Dec 19, 2024

nkerman Dec 19, 2024

nkerman Dec 19, 2024

subagonsouth Dec 19, 2024

nkerman Dec 19, 2024

nkerman Dec 19, 2024

nkerman Dec 19, 2024

nkerman left a comment

nkerman Dec 19, 2024

subagonsouth Dec 19, 2024

Refactor parse_direct_events to remove for loop #1220

Refactor parse_direct_events to remove for loop #1220

Conversation

subagonsouth commented Dec 13, 2024

Change Summary

Updated Files

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkerman Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkerman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkerman Dec 19, 2024 •

edited

Loading