-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWE science L1A algorithm #154
Conversation
…d to add it back in.
…d to add it back in.
…mmitted by mistake
…ounts (IMAP-Science-Operations-Center#158) Added pkt definitions for housekeeping, sectored, angular,and priority counts.
- update way of checking if description exists or not. update generator template. - updated generator to use correct IntegerParameterType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few minor suggestions to move towards more numpy functions when possible, but nothing blocking and those can be made later if you want as well.
bit_array = bitstring.ConstBitStream(bin=binary_data) | ||
# chunk bit array into 1260 units, each with 8-bits | ||
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to create the bit_array at all? Or can you just grab the bytes from binary_data directly? I don't know what the .bytes
attribute is doing if it returns utf-8 bytes or something else, but I'm guessing there is a way to avoid bitstring here.
bit_array = bitstring.ConstBitStream(bin=binary_data) | |
# chunk bit array into 1260 units, each with 8-bits | |
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8) | |
# chunk bit array into 1260 units, each with 8-bits | |
byte_data = np.frombuffer(bytes(binary_data), dtype=np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tried it but it wouldn't work. If I call bytes(binary_data)
, it expect bytes-like object. And to convert binary string to byte-like object, it looks like I need to do this
binary_string = "0b0110011001100001011011000110011001100101"
bytes_object = bytes(int(binary_string, 2).to_bytes((len(binary_string) + 7) // 8, byteorder='big'))
It looks more complicate and confusing. That's why I didn't do it. Do you know another way though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uggh, I forgot that it was a string :( I was thinking it was already a bytesIO object. I think you're right that is what you'd have to do. I agree it looks more complicated and confusing, but removes a potentially slow step and import. But, I think you could get rid of some of the bit padding and checks since you know the exact length in bytes already:
int(s, 2).to_bytes(len(s) // 8, byteorder='big')
# Or since you know the exact length already just use it directly
int(s, 2).to_bytes(1260, byteorder='big')
One other question would be is bit_array
really a string, or was it a bitstream already that the space_packet_parser
returns to you and you could just call binary_data.bytes
directly on that object instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary_data that space_packet_parser
returns is a string type so I can't call .bytes
. The example I wrote above wrong. It actually looks like this.
binary_string = "0000000011" # without 0b at start
Sorry for the confusion. But I think this suggestion makes sense.
int(s, 2).to_bytes(1260, byteorder='big')
|
||
dataset = xr.Dataset( | ||
{"SCIENCE_DATA": science_xarray}, | ||
coords={"met_time": metadata_arrays["SHCOARSE"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will all be using some form of "met_time", so is that we are using on all of the other instruments as well? Or do we want to call the variable "MET" which I've seen in most of the POC documents?
We should also put a short_description tag in with a label "Mission Elapsed Time" and units field "seconds since start of the mission" or something like that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure on how we use this, tbh. I thought that this packet creation time could be something of use.
But I did change to have those inputs. I created xr.DataArray for SHCOARSE
and added those attrs.
# read raw data as binary array using bitstring | ||
# Eg. "0b0101010111" | ||
bit_array = bitstring.ConstBitStream(bin=binary_data) | ||
# chunk bit array into 1260 units, each with 8-bits | ||
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# read raw data as binary array using bitstring | |
# Eg. "0b0101010111" | |
bit_array = bitstring.ConstBitStream(bin=binary_data) | |
# chunk bit array into 1260 units, each with 8-bits | |
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8) | |
# read raw data | |
binary_data = data_packet.data["SCIENCE_DATA"].raw_value | |
# read binary string as an int and then convert it to | |
# bytes. | |
# Eg. "0000000011110011" --> b'\x00\xf3' | |
byte_data = int(binary_data, 2).to_bytes(1260, byteorder='big') | |
# convert bytes to numpy array of uint8 | |
raw_counts = np.frombuffer(byte_data, dtype=np.uint8) |
@greglucas Does this change look ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I personally think that looks better and then you can remove the explicit dependency in our requirements. But looks good either way to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. I can remove import in mine but I can't remove bitstring
from dependency because IDEX code is using it.
|
||
science_xarray = xr.DataArray( | ||
science_array, | ||
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how to change this around, but when we run the xarray->cdf, time should be in the first dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually should the first dimension be number_of_packets? It seems like "number_of_packets" could just be "met_time"? Then you don't need the intermediate dimension of "number_of_packets". You'd also need to change the dimensions of "met_time" below to just be "met_time" (a variable is allowed to be its own dimension within xarray).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know. I didn't know that. I will write a suggested change based on this discussion in a new conversation.
|
||
# We know we can only have 8 bit numbers input, so iterate over all | ||
# possibilities once up front | ||
decompression_table = np.array([uncompress_counts(i) for i in range(256)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the table is static and doesn't change, does it make sense to just have it live in a csv file somewhere? I guess it's less flexible in case the table needs to be changed. But it might help people understand what is going on a little better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They said that it will hardly change. But do you mean, run uncompression once for 256 values and store in a csv? Then read csv and use it directly. That could work.
I will be using lookup table for l1b that could change sometimes. I was going to ask you about how to store that lookup table and I think you gave me ideas for that. Thank you!
description="Mission elapsed time", | ||
units="seconds since start of the mission", | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this time since launch? At some point we should convert the time to unix timestamp (seconds since 1970). The xarray_to_cdf routine has a flag to convert unix timestamp to CDF_TT2000 time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree. I am not sure of that myself. I asked Nick Dutton and his guess is that it may be the time when we start getting first series of data packet from spacecraft or time when we start getting science data. We don't know for sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question of when we want to convert time. Because you can't get to unix timestamp from MET without using SPICE. So I think we'll need @laspsandoval's routines to handle that conversion.
Can we convert to Unix time right away in L1a? Or do we need to carry MET around in every file for the SPICE queries? We can carry around both MET and Unix time presumably, just call them different variable names.
# where 1260 = 15 seconds x 12 energy steps x 7 CEMs | ||
# Take the "raw_counts" indices/counts mapping from | ||
# decompression_table and then reshape the return | ||
uncompress_data = np.take(decompression_table, raw_counts).reshape(15, 12, 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look up what the np.take function does, I guess it does the same thing as the "fancy indexing" I'm used to! I believe that decompression_table[raw_counts]
actually returns the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same! I had to lookup too.
|
||
science_xarray = xr.DataArray( | ||
science_array, | ||
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually should the first dimension be number_of_packets? It seems like "number_of_packets" could just be "met_time"? Then you don't need the intermediate dimension of "number_of_packets". You'd also need to change the dimensions of "met_time" below to just be "met_time" (a variable is allowed to be its own dimension within xarray).
imap_processing/swe/l1a/swe_l1a.py
Outdated
# If appId is science, then the file should contain all data of science appId | ||
if decom_data[0].header["PKT_APID"].raw_value == 1344: | ||
logging.info("Processing science data") | ||
swe_science(decom_data=decom_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function return the output of swe_science?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. good catch. I should return the data here. Then we can think about how we go from this to CDF in the future.
imap_processing/swe/l1a/swe_l1a.py
Outdated
# If appId is science, then the file should contain all data of science appId | ||
if decom_data[0].header["PKT_APID"].raw_value == 1344: | ||
logging.info("Processing science data") | ||
swe_science(decom_data=decom_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. good catch. I should return the data here. Then we can think about how we go from this to CDF in the future.
|
||
# We know we can only have 8 bit numbers input, so iterate over all | ||
# possibilities once up front | ||
decompression_table = np.array([uncompress_counts(i) for i in range(256)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They said that it will hardly change. But do you mean, run uncompression once for 256 values and store in a csv? Then read csv and use it directly. That could work.
I will be using lookup table for l1b that could change sometimes. I was going to ask you about how to store that lookup table and I think you gave me ideas for that. Thank you!
# where 1260 = 15 seconds x 12 energy steps x 7 CEMs | ||
# Take the "raw_counts" indices/counts mapping from | ||
# decompression_table and then reshape the return | ||
uncompress_data = np.take(decompression_table, raw_counts).reshape(15, 12, 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same! I had to lookup too.
|
||
science_xarray = xr.DataArray( | ||
science_array, | ||
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know. I didn't know that. I will write a suggested change based on this discussion in a new conversation.
science_xarray = xr.DataArray( | ||
science_array, | ||
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"], | ||
) | ||
|
||
met_time = xr.DataArray( | ||
metadata_arrays["SHCOARSE"], | ||
name="MET", | ||
dims=["number_of_packets"], | ||
attrs=dict( | ||
description="Mission elapsed time", | ||
units="seconds since start of the mission", | ||
), | ||
) | ||
|
||
dataset = xr.Dataset( | ||
{"SCIENCE_DATA": science_xarray}, | ||
coords={"met_time": met_time}, | ||
) | ||
|
||
# create xarray dataset for each metadata field | ||
for key, value in metadata_arrays.items(): | ||
if key == "SHCOARSE": | ||
continue | ||
|
||
dataset[key] = xr.DataArray( | ||
value, | ||
dims=["number_of_packets"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryan-harter will it look something like this when we add time to be the first dimension.
science_xarray = xr.DataArray( | |
science_array, | |
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"], | |
) | |
met_time = xr.DataArray( | |
metadata_arrays["SHCOARSE"], | |
name="MET", | |
dims=["number_of_packets"], | |
attrs=dict( | |
description="Mission elapsed time", | |
units="seconds since start of the mission", | |
), | |
) | |
dataset = xr.Dataset( | |
{"SCIENCE_DATA": science_xarray}, | |
coords={"met_time": met_time}, | |
) | |
# create xarray dataset for each metadata field | |
for key, value in metadata_arrays.items(): | |
if key == "SHCOARSE": | |
continue | |
dataset[key] = xr.DataArray( | |
value, | |
dims=["number_of_packets"], | |
) | |
met_time = xr.DataArray( | |
metadata_arrays["SHCOARSE"], | |
name="met_time", | |
dims=["met_time"], | |
attrs=dict( | |
description="Mission elapsed time", | |
units="seconds since start of the mission", | |
), | |
) | |
science_xarray = xr.DataArray( | |
science_array, | |
dims=["met_time", "seconds", "energy_steps", "cem_counts"], | |
) | |
dataset = xr.Dataset( | |
{"SCIENCE_DATA": science_xarray}, | |
coords={"met_time": met_time}, | |
) | |
# create xarray dataset for each metadata field | |
for key, value in metadata_arrays.items(): | |
if key == "SHCOARSE": | |
continue | |
dataset[key] = xr.DataArray( | |
value, | |
dims=["met_time"], | |
) |
description="Mission elapsed time", | ||
units="seconds since start of the mission", | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree. I am not sure of that myself. I asked Nick Dutton and his guess is that it may be the time when we start getting first series of data packet from spacecraft or time when we start getting science data. We don't know for sure though.
* added generators for hit and lo, updated telem generator * removed hit and lo files * better None/NAN handling, using both short and long description tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
# read binary string to an int and then convert it to | ||
# bytes. This is to convert the string to bytes. | ||
# Eg. "0000000011110011" --> b'\x00\xf3' | ||
byte_data = int(binary_data, 2).to_bytes(1260, byteorder="big") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you getting the 1260 byte length from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1260 is from 1260 = 15 seconds x 12 energy steps x 7 CEMs. I had comment about it before but removed from recent changes. I will add it back. Thank you!
I have addressed all of feedback and created issue for time conversion question. Will revisit in future PR. Going to squash commits and merge this PR. |
b1a7b65
into
IMAP-Science-Operations-Center:dev
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
Change Summary
This contains code for SWE science data's l1a algorithm implementation. This is implementation of this issue, #103.
Overview
I moved files around to match folder structure we decided to use,
imap_processing/{instrument}/packet_definitions/
.New Files
Updated Files
Testing