-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ICESat-2 data read-in functionality #222
Conversation
Note: the current Travis build failure should be fixed by #213. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JessicaS11 for all the work! Overall, I think the icepyx.Read
class and Read.load()
method is a great start. Really wish this was implemented 2 years ago for my PhD research, but anyways... there's a lot of suggested changes below, mostly minor stylistic changes for read.py
at the moment. I've also added a binder link to your original comment at the top and some instructions for others on how to test things.
To make it easier to review the examples/ICESat-2_Data_Read-in_Example.ipynb
though, could you consider using jupytext
to turn the .ipynb
file to a .py
file? It will be easier to comment and make suggested changes on plaintext instead of a JSON. Steps would be:
pip install jupytext
jupytext --set-formats ipynb,py:percent examples/ICESat-2_Data_Read-in_Example.ipynb
# there should now be an ICESat-2_Data_Read-in_Example.py file
git add examples/ICESat-2_Data_Read-in_Example.py
git commit -m "Add jupytext converted ICESat-2_Data_Read-in_Example.py file"
git push
My other main concern is the data type of some of the coordinates of the output xarray.Dataset
which are of object
or binary
dtypes when they could be str
/int
or datetime64
.
<xarray.Dataset>
Dimensions: (delta_time: 55654, gran_idx: 3, spot: 2)
Coordinates:
* delta_time (delta_time) datetime64[ns] 2019-02-22T01:06:07.5054...
* gran_idx (gran_idx) object '084902' '090202' '091002'
* spot (spot) int64 2 5
source_file (gran_idx) <U95 '/home/username/Documents/github/icepyx...
gt (gran_idx, spot) object 'gt3r' 'gt1l' ... 'gt3r' 'gt1l'
Data variables:
sc_orient (gran_idx) int8 0 0 0
cycle_number (gran_idx) int8 2 2 2
rgt (gran_idx) int16 849 902 910
atlas_sdp_gps_epoch (gran_idx) datetime64[ns] 2018-01-01T00:00:18 ... 20...
data_start_utc (gran_idx) |S27 b'2019-02-22T01:03:44.199777Z' ... b...
data_end_utc (gran_idx) |S27 b'2019-02-22T01:07:38.112327Z' ... b...
h_li (spot, gran_idx, delta_time) float32 nan nan ... nan
latitude (spot, gran_idx, delta_time) float64 nan nan ... nan
longitude (spot, gran_idx, delta_time) float64 nan nan ... nan
Attributes:
data_product: ATL06
Description: The land_ice_height group contains the primary set of deri...
data_rate: Data within this group are sparse. Data values are provid...
Specifically, the gran_idx
coordinate should preferrably have a str
/int
dtype (instead of object
), while the data_start_utc
and data_end_utc
should preferrably have a datetime64
dtype (instead of S27
/binary
). I'll need to step through the code a bit more to see where this can be fixed, but maybe you have some ideas.
Thanks, as always, for the great feedback and edits @weiji14! They are so appreciated. Making notebook review easier is on my long dev-to-do list, and was something (along with the dormant gallery work) I hoped to pick up again in the coming weeks. Thanks for adding the binder link in the meantime, and I'll keep moving forward with your suggestions for using Great point about fixing the type of the coordinates. There's a |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
130528b
to
247725c
Compare
…t pattern_as_path input needs
…bles object on the read object
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## development #222 +/- ##
===============================================
- Coverage 54.94% 51.11% -3.84%
===============================================
Files 20 24 +4
Lines 1547 1839 +292
Branches 321 386 +65
===============================================
+ Hits 850 940 +90
- Misses 639 840 +201
- Partials 58 59 +1
Continue to review full report at Codecov.
|
@icetianli @weiji14 What do you think? Can we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JessicaS11, I think we're just about ready 😄 The xarray.Dataset
's coordinate data types look much better now:
<xarray.Dataset>
Dimensions: (gran_idx: 1, spot: 2, delta_time: 25428)
Coordinates:
* gran_idx (gran_idx) uint64 83903
* spot (spot) uint8 2 5
* delta_time (delta_time) datetime64[ns] 2019-05-23T05:39:00.8367...
source_file (gran_idx) <U41 './ATL06_20190523053429_08390310_004...
Data variables:
sc_orient (gran_idx) int8 0
cycle_number (gran_idx) int8 3
rgt (gran_idx) int16 839
h_li (spot, gran_idx, delta_time) float32 -52.24 ... nan
latitude (spot, gran_idx, delta_time) float64 -67.12 ... nan
longitude (spot, gran_idx, delta_time) float64 165.9 ... nan
gt (gran_idx, spot) <U4 'gt3r' 'gt1l'
atlas_sdp_gps_epoch (gran_idx) datetime64[ns] 2018-01-01T00:00:18
data_start_utc (gran_idx) |S27 b'2019-05-23T05:39:00.477122Z'
data_end_utc (gran_idx) |S27 b'2019-05-23T05:42:10.569813Z'
Attributes:
data_product: ATL06
Description: The land_ice_height group contains the primary set of deri...
data_rate: Data within this group are sparse. Data values are provid...
The data_start_utc
and data_end_utc
still seems to be a string type for some reason, didn't you fix that as mentioned in #222 (comment)? I'd see if that can be fixed quickly, otherwise just do it in a follow-up Pull Request since those two variables not that commonly used compared to delta_time
.
Oh, and I did spot a few minor typos and an ImportError, but once those are fixed, I think this PR is in a good enough state to bring to the world 🚀
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Good catch. I had, but numpy was issuing a warning, so I'd added a silencer for the warning, which apparently was making it not complete the |
* add github action to add binder badge to PRs (#229) * use the binder badge action directly (instead of a manual implementation of it) (#233) See the discussion in #230 for more details on this switch. * preliminary AWS access (#213) * update links for travis badge (#234) * Fix failing test_visualization_date_range check for ATL07 (#241) * By default, no email status updates to users when ordering granules (#240) * remove extra cell causing errors in example notebook * Set default page size for orders to 2000 per NSIDC recommendation (#239) * Add ICESat-2 data read-in functionality (#222) * update examples from 2020 Hackweek tutorials * update add and commit GitHub Action version (and UML diagrams) (#244) * merge traffic (GitHub and PyPI) and bib updates (#245) * Release 0.5.0 (#246) * release v0.5.0 CI fixes (#251) * fix Travis CI label in readme * update earthdata login fixture for testing * add required input to pytest fixture (#252) Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> Co-authored-by: trey-stafford <trey.stafford@colorado.edu>
A long awaited and exciting addition, this PR introduces basic data read-in functionality to icepyx. The primary workhorse is the
read.py
module, which implements a Read class object.Read is initialized with input an input file, directory, or s3 bucket url, your ICESat-2 data product, an intake-style filename pattern to match which files you'd like to read in, and an optional intake catalog if you have one you'd really like to use (and are only trying to read in one group). Similar to NSIDC variable subsetting, the Read object uses icepyx's ICESat-2
variables.py
module to determine which variables are available and then create a list of those you'd like to actually read in (this is usually hard-coded into most readers). Then, it will iterate through all of the granules (files) you've provided and create an Xarray dataset with all of the variables you've requested. To do this, under the hood it creates an Intake catalog for each variable group and constructs a list of per-granule Xarray DataSets that contain ALL of the variables you've asked for. Then, it will merge them into a single DataSet. Future efforts will then extend default Xarray functionality to do ICESat-2 specific tasks like "get_strong_beams".Please see the new
ICESat-2_Data_Read-In_Example.ipynb
, which explains the functionality in more detail as well as provides a quick start guide for using the new module.I'm looking for any and all feedback on and contributions to the code, example, explanations, tests, etc. I've no doubt there will be lots of new bugs to fix as we move outside my super-small test case environment, and I'm eager to hear all of your thoughts. Please join the conversation - no detail is too small!
Review on ReviewNB: https://app.reviewnb.com/icesat2py/icepyx/pull/222/
Binder link:
Instructions to test: