-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor parse_direct_events to remove for loop #1220
Refactor parse_direct_events to remove for loop #1220
Conversation
Add test coverage for parse_direct_events
de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + ( | ||
data_uint16[2] >> (2 + 8) | ||
) | ||
de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice new way! Nice refactor. It makes sense to me on high level but I will definitely ask Nat to approve it to get one more person to double check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in progress right now! Sorry it took a bit to get to!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! This makes sense to me based on the docstring
trigger_id - first word gets shifted right 14 bits (discarding them) leaving 2 bits
de_tag - gets the rest of the bits from the first the first word that are not the trigger id and the first 2 bits of word 2 (16 bits)
tof-1 - uses a mask to skip the first two bits of the second word and keep the middle 10 bits - then shifts right to clear the bottom 4 zeroed bits. (10bits)
tof-2 - keeps last 4 bits and shifts left to clear the first 12 then gets the first 6 bits of the third word (10 bits)
tof-3 - keeps last 10 bits of the third word (10 bits)
Had to write that all out to keep track 😁
).astype(np.uint16) | ||
|
||
de_dict = dict() | ||
de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in above line, you were ordering data by column. Does this mean that, data_uint16 would look something like this?
de_data = 'word_0word_1word_2word_3word_4word5'
data_uint16 = [
[word_0, word_3],
[word_1, word_4],
[word_2, word_5],
]
Now by getting first row, you are able to unpack trigger id and so on? Each row containing some information about DE. This is me talking out my understanding out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you have it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think based on the 'order="F"' column ordering, that each column represents a 6-byte DE. So they don't come in pairs but instead, the number of columns is the number of DEs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer, I'm wondering if my confusion came from the use of the word... "word" 😁 , being used to describe individual bytes, while the comment describes:
"Each DE consists of 6-bytes. Considering the data as 3 2-byte words,"
Is the following correct?:
For each DE, we only read 3x 2-byte words as values into the data_uint16
array.
# hex buffer data of 6 bytes:
de_data = b"\x00\x00\x00\x01\x00\x02"
# read these as 3x, 2-byte words into array
data_uint16 => [
0, # \x00\x00 => "0b00000000"
1, # \x00\x01 => "0b00000001"
258, # \x01\x02 = 16**2 +2 => "0b10000010"
]
Then we get whatever bits we need from these 3 "words".
Note: I now realize that this example was for a 1 DE input string, and Tenzin's example was a 2 DE case, with 6 words!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like everyone is saying almost the same thing, but let me try to clarify. I can never remember if numpy does rows or columns first, so I will avoid those terms and instead use first and second axis. I will see if I can clean up the comments in the code too.
The de_data
is a binary blob with Nx6 bytes of data where N = number of direct events encoded into the binary blob. Interpreting the data as big-endian uint16 data and reshaping into a (3, -1) ndarray results in an array with shape (3, N). Indexing the first axis of that array (e.g. data_uint16[i]
) gives the i
th 2-bytes of data for each of the N direct events. In this way, indexing the first axis and applying bitwise operators to the resulting N-element uint16 array allows extraction of the encoded DE data without looping.
Does that help any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The word_3
comment we talked about, combined with Tenzin using a 2-DE example threw me off, but this all makes sense to me now.
# Each DE consists of 6-bytes. Considering the data as 3 2-byte words, | ||
# each word contains the following: | ||
# word_0: 2-bits of Trigger ID, upper 14-bits of de_tag | ||
# word_1: lower 2-bits of de_tag, 10-bits tof_1, upper 4-bits of tof_2 | ||
# word_3: lower 6-bits of tof_2, 10-bits of tof_3 | ||
# Interpret binary blob as uint16 array and reshape to (3, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why it's 0,1,3
here? It seems intentional and to denote some conceptual distinction, but I can't make sense of why it's not 0,1,2
or 1,2,3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. As implemented below, it should be word_2
not word_3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Everything makes sense to me now!
).astype(np.uint16) | ||
|
||
de_dict = dict() | ||
de_dict["trigger_id"] = (data_uint16[0] >> (6 + 8)).astype(np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't there only 3, 2-byte words from the 6-bytes of a direct event? Or do they come in pairs of 2 DEs? This might explain my previous confusion, if words 2,4,5 represent another 6-byte DE.
de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + ( | ||
data_uint16[2] >> (2 + 8) | ||
) | ||
de_dict["tof_3"] = data_uint16[2] & int(b"00000011_11111111", 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in progress right now! Sorry it took a bit to get to!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It took me a while (and a few stream of conscious comments) to work out the mechanism here, but I think it all works for efficiently parsing arbitrary numbers of DEs.
The only thing I'm still confused by is why you count words as 0,1,3.
de_dict["de_tag"] = (data_uint16[0] << 2) + (data_uint16[1] >> (6 + 8)) | ||
de_dict["tof_1"] = (data_uint16[1] & int(b"00111111_11110000", 2)) >> 4 | ||
de_dict["tof_2"] = ((data_uint16[1] & int(b"00000000_00001111", 2)) << 6) + ( | ||
data_uint16[2] >> (2 + 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the bitwise ops look good!
Why not hardcode in 10 instead of (2+8) and 14 instead of (6+8), given that the comment specifies that the skipped bits are for single, 10 bit (tof_3) and 14 bit (de_tag) values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point. That is an artifact from how I first implemented it.
2781cee
into
IMAP-Science-Operations-Center:dev
Add test coverage for parse_direct_events
Change Summary
This PR changes how the direct events are parsed from the binary blob in Hi SCI_DE packets. It interprets the bytes data as 16-bit unsigned integers and then reshapes them into a 3xN ndarray. This is done to take advantage of numpy broadcasting. Bit-shifting and bitwise_and with bitmasks are used to extract the unsigned integer values from the 16-bit integers.
Updated Files
parse_direct_events
function.Closes: #1217