Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWE science L1A algorithm #154

Merged
merged 104 commits into from
Sep 21, 2023
Merged

Conversation

tech3371
Copy link
Contributor

@tech3371 tech3371 commented Sep 12, 2023

Change Summary

This contains code for SWE science data's l1a algorithm implementation. This is implementation of this issue, #103.

Overview

I moved files around to match folder structure we decided to use, imap_processing/{instrument}/packet_definitions/.

New Files

  • swe_l1a.py
    • This code read l0 file and based on appId, it starts processing science or housekeeping data and so on.
  • swe_science.py
    • This code contains decompression algorithms and stores process data to xarray dataset.

Updated Files

  • swe_packet_definition.xml
    • Updated to match latest packet definition.
  • decom_swe.py and test_decom.py
    • updated decom with new folder path and update import in test file.
  • pyproject.toml
    • updated it to include bitstring

Testing

  • test_swe_science.py
    • Wrote basic tests for SWE science.

@tech3371 tech3371 added Ins: SWE Related to the SWE instrument Level: L1 Level 1 processing labels Sep 12, 2023
@tech3371 tech3371 added this to the SDC SIT-3 milestone Sep 12, 2023
@tech3371 tech3371 requested a review from a team September 12, 2023 22:02
@tech3371 tech3371 self-assigned this Sep 12, 2023
@tech3371 tech3371 requested review from bourque and removed request for a team September 12, 2023 22:02
GFMoraga and others added 2 commits September 13, 2023 17:30
…ounts (IMAP-Science-Operations-Center#158)

Added pkt definitions for housekeeping, sectored, angular,and priority counts.
- update way of checking if description exists or not. update generator template.
- updated generator to use correct IntegerParameterType
Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor suggestions to move towards more numpy functions when possible, but nothing blocking and those can be made later if you want as well.

imap_processing/swe/l1a/swe_science.py Outdated Show resolved Hide resolved
Comment on lines 142 to 144
bit_array = bitstring.ConstBitStream(bin=binary_data)
# chunk bit array into 1260 units, each with 8-bits
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to create the bit_array at all? Or can you just grab the bytes from binary_data directly? I don't know what the .bytes attribute is doing if it returns utf-8 bytes or something else, but I'm guessing there is a way to avoid bitstring here.

Suggested change
bit_array = bitstring.ConstBitStream(bin=binary_data)
# chunk bit array into 1260 units, each with 8-bits
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8)
# chunk bit array into 1260 units, each with 8-bits
byte_data = np.frombuffer(bytes(binary_data), dtype=np.uint8)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried it but it wouldn't work. If I call bytes(binary_data), it expect bytes-like object. And to convert binary string to byte-like object, it looks like I need to do this

binary_string = "0b0110011001100001011011000110011001100101"
bytes_object = bytes(int(binary_string, 2).to_bytes((len(binary_string) + 7) // 8, byteorder='big'))

It looks more complicate and confusing. That's why I didn't do it. Do you know another way though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uggh, I forgot that it was a string :( I was thinking it was already a bytesIO object. I think you're right that is what you'd have to do. I agree it looks more complicated and confusing, but removes a potentially slow step and import. But, I think you could get rid of some of the bit padding and checks since you know the exact length in bytes already:

int(s, 2).to_bytes(len(s) // 8, byteorder='big')
# Or since you know the exact length already just use it directly
int(s, 2).to_bytes(1260, byteorder='big')

One other question would be is bit_array really a string, or was it a bitstream already that the space_packet_parser returns to you and you could just call binary_data.bytes directly on that object instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary_data that space_packet_parser returns is a string type so I can't call .bytes. The example I wrote above wrong. It actually looks like this.

binary_string = "0000000011" # without 0b at start

Sorry for the confusion. But I think this suggestion makes sense.

int(s, 2).to_bytes(1260, byteorder='big')

imap_processing/swe/l1a/swe_science.py Outdated Show resolved Hide resolved
imap_processing/swe/l1a/swe_science.py Outdated Show resolved Hide resolved

dataset = xr.Dataset(
{"SCIENCE_DATA": science_xarray},
coords={"met_time": metadata_arrays["SHCOARSE"]},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will all be using some form of "met_time", so is that we are using on all of the other instruments as well? Or do we want to call the variable "MET" which I've seen in most of the POC documents?

We should also put a short_description tag in with a label "Mission Elapsed Time" and units field "seconds since start of the mission" or something like that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on how we use this, tbh. I thought that this packet creation time could be something of use.

But I did change to have those inputs. I created xr.DataArray for SHCOARSE and added those attrs.

imap_processing/swe/l1a/swe_science.py Outdated Show resolved Hide resolved
Comment on lines 140 to 144
# read raw data as binary array using bitstring
# Eg. "0b0101010111"
bit_array = bitstring.ConstBitStream(bin=binary_data)
# chunk bit array into 1260 units, each with 8-bits
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# read raw data as binary array using bitstring
# Eg. "0b0101010111"
bit_array = bitstring.ConstBitStream(bin=binary_data)
# chunk bit array into 1260 units, each with 8-bits
byte_data = np.frombuffer(bit_array.bytes, dtype=np.uint8)
# read raw data
binary_data = data_packet.data["SCIENCE_DATA"].raw_value
# read binary string as an int and then convert it to
# bytes.
# Eg. "0000000011110011" --> b'\x00\xf3'
byte_data = int(binary_data, 2).to_bytes(1260, byteorder='big')
# convert bytes to numpy array of uint8
raw_counts = np.frombuffer(byte_data, dtype=np.uint8)

@greglucas Does this change look ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I personally think that looks better and then you can remove the explicit dependency in our requirements. But looks good either way to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I can remove import in mine but I can't remove bitstring from dependency because IDEX code is using it.


science_xarray = xr.DataArray(
science_array,
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how to change this around, but when we run the xarray->cdf, time should be in the first dimension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually should the first dimension be number_of_packets? It seems like "number_of_packets" could just be "met_time"? Then you don't need the intermediate dimension of "number_of_packets". You'd also need to change the dimensions of "met_time" below to just be "met_time" (a variable is allowed to be its own dimension within xarray).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. I didn't know that. I will write a suggested change based on this discussion in a new conversation.


# We know we can only have 8 bit numbers input, so iterate over all
# possibilities once up front
decompression_table = np.array([uncompress_counts(i) for i in range(256)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table is static and doesn't change, does it make sense to just have it live in a csv file somewhere? I guess it's less flexible in case the table needs to be changed. But it might help people understand what is going on a little better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They said that it will hardly change. But do you mean, run uncompression once for 256 values and store in a csv? Then read csv and use it directly. That could work.

I will be using lookup table for l1b that could change sometimes. I was going to ask you about how to store that lookup table and I think you gave me ideas for that. Thank you!

description="Mission elapsed time",
units="seconds since start of the mission",
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this time since launch? At some point we should convert the time to unix timestamp (seconds since 1970). The xarray_to_cdf routine has a flag to convert unix timestamp to CDF_TT2000 time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree. I am not sure of that myself. I asked Nick Dutton and his guess is that it may be the time when we start getting first series of data packet from spacecraft or time when we start getting science data. We don't know for sure though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question of when we want to convert time. Because you can't get to unix timestamp from MET without using SPICE. So I think we'll need @laspsandoval's routines to handle that conversion.

Can we convert to Unix time right away in L1a? Or do we need to carry MET around in every file for the SPICE queries? We can carry around both MET and Unix time presumably, just call them different variable names.

# where 1260 = 15 seconds x 12 energy steps x 7 CEMs
# Take the "raw_counts" indices/counts mapping from
# decompression_table and then reshape the return
uncompress_data = np.take(decompression_table, raw_counts).reshape(15, 12, 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look up what the np.take function does, I guess it does the same thing as the "fancy indexing" I'm used to! I believe that decompression_table[raw_counts] actually returns the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same! I had to lookup too.


science_xarray = xr.DataArray(
science_array,
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually should the first dimension be number_of_packets? It seems like "number_of_packets" could just be "met_time"? Then you don't need the intermediate dimension of "number_of_packets". You'd also need to change the dimensions of "met_time" below to just be "met_time" (a variable is allowed to be its own dimension within xarray).

# If appId is science, then the file should contain all data of science appId
if decom_data[0].header["PKT_APID"].raw_value == 1344:
logging.info("Processing science data")
swe_science(decom_data=decom_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function return the output of swe_science?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. good catch. I should return the data here. Then we can think about how we go from this to CDF in the future.

# If appId is science, then the file should contain all data of science appId
if decom_data[0].header["PKT_APID"].raw_value == 1344:
logging.info("Processing science data")
swe_science(decom_data=decom_data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. good catch. I should return the data here. Then we can think about how we go from this to CDF in the future.


# We know we can only have 8 bit numbers input, so iterate over all
# possibilities once up front
decompression_table = np.array([uncompress_counts(i) for i in range(256)])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They said that it will hardly change. But do you mean, run uncompression once for 256 values and store in a csv? Then read csv and use it directly. That could work.

I will be using lookup table for l1b that could change sometimes. I was going to ask you about how to store that lookup table and I think you gave me ideas for that. Thank you!

# where 1260 = 15 seconds x 12 energy steps x 7 CEMs
# Take the "raw_counts" indices/counts mapping from
# decompression_table and then reshape the return
uncompress_data = np.take(decompression_table, raw_counts).reshape(15, 12, 7)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same! I had to lookup too.


science_xarray = xr.DataArray(
science_array,
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. I didn't know that. I will write a suggested change based on this discussion in a new conversation.

Comment on lines 138 to 166
science_xarray = xr.DataArray(
science_array,
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"],
)

met_time = xr.DataArray(
metadata_arrays["SHCOARSE"],
name="MET",
dims=["number_of_packets"],
attrs=dict(
description="Mission elapsed time",
units="seconds since start of the mission",
),
)

dataset = xr.Dataset(
{"SCIENCE_DATA": science_xarray},
coords={"met_time": met_time},
)

# create xarray dataset for each metadata field
for key, value in metadata_arrays.items():
if key == "SHCOARSE":
continue

dataset[key] = xr.DataArray(
value,
dims=["number_of_packets"],
)
Copy link
Contributor Author

@tech3371 tech3371 Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryan-harter will it look something like this when we add time to be the first dimension.

Suggested change
science_xarray = xr.DataArray(
science_array,
dims=["number_of_packets", "seconds", "energy_steps", "cem_counts"],
)
met_time = xr.DataArray(
metadata_arrays["SHCOARSE"],
name="MET",
dims=["number_of_packets"],
attrs=dict(
description="Mission elapsed time",
units="seconds since start of the mission",
),
)
dataset = xr.Dataset(
{"SCIENCE_DATA": science_xarray},
coords={"met_time": met_time},
)
# create xarray dataset for each metadata field
for key, value in metadata_arrays.items():
if key == "SHCOARSE":
continue
dataset[key] = xr.DataArray(
value,
dims=["number_of_packets"],
)
met_time = xr.DataArray(
metadata_arrays["SHCOARSE"],
name="met_time",
dims=["met_time"],
attrs=dict(
description="Mission elapsed time",
units="seconds since start of the mission",
),
)
science_xarray = xr.DataArray(
science_array,
dims=["met_time", "seconds", "energy_steps", "cem_counts"],
)
dataset = xr.Dataset(
{"SCIENCE_DATA": science_xarray},
coords={"met_time": met_time},
)
# create xarray dataset for each metadata field
for key, value in metadata_arrays.items():
if key == "SHCOARSE":
continue
dataset[key] = xr.DataArray(
value,
dims=["met_time"],
)

description="Mission elapsed time",
units="seconds since start of the mission",
),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree. I am not sure of that myself. I asked Nick Dutton and his guess is that it may be the time when we start getting first series of data packet from spacecraft or time when we start getting science data. We don't know for sure though.

* added generators for hit and lo, updated telem generator

* removed hit and lo files

* better None/NAN handling, using both short and long description tags
Copy link
Contributor

@sdhoyt sdhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

# read binary string to an int and then convert it to
# bytes. This is to convert the string to bytes.
# Eg. "0000000011110011" --> b'\x00\xf3'
byte_data = int(binary_data, 2).to_bytes(1260, byteorder="big")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you getting the 1260 byte length from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1260 is from 1260 = 15 seconds x 12 energy steps x 7 CEMs. I had comment about it before but removed from recent changes. I will add it back. Thank you!

@tech3371
Copy link
Contributor Author

I have addressed all of feedback and created issue for time conversion question. Will revisit in future PR. Going to squash commits and merge this PR.

@tech3371 tech3371 merged commit b1a7b65 into IMAP-Science-Operations-Center:dev Sep 21, 2023
12 checks passed
@tech3371 tech3371 deleted the dev branch September 21, 2023 23:45
@tech3371 tech3371 restored the dev branch September 21, 2023 23:53
maxinelasp pushed a commit to maxinelasp/imap_processing that referenced this pull request Sep 29, 2023
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
maxinelasp pushed a commit to maxinelasp/imap_processing that referenced this pull request Oct 2, 2023
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
@tech3371 tech3371 linked an issue Oct 4, 2023 that may be closed by this pull request
@tech3371 tech3371 modified the milestones: SDC SIT-3, Decom and L1A Complete Oct 4, 2023
laspsandoval pushed a commit to laspsandoval/imap_processing that referenced this pull request Nov 15, 2023
First implementation of SWE l1a algorithm. This decom raw data and uncompress raw counts and save data to xarray dataset.
@tech3371 tech3371 removed the Ins: SWE Related to the SWE instrument label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Level: L1 Level 1 processing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Uncompress counts for SWE_SCIENCE Implement L1A algorithm for SWE
5 participants