Add ICESat-2 data read-in functionality #222

JessicaS11 · 2021-09-10T17:57:54Z

A long awaited and exciting addition, this PR introduces basic data read-in functionality to icepyx. The primary workhorse is the read.py module, which implements a Read class object.

Read is initialized with input an input file, directory, or s3 bucket url, your ICESat-2 data product, an intake-style filename pattern to match which files you'd like to read in, and an optional intake catalog if you have one you'd really like to use (and are only trying to read in one group). Similar to NSIDC variable subsetting, the Read object uses icepyx's ICESat-2 variables.py module to determine which variables are available and then create a list of those you'd like to actually read in (this is usually hard-coded into most readers). Then, it will iterate through all of the granules (files) you've provided and create an Xarray dataset with all of the variables you've requested. To do this, under the hood it creates an Intake catalog for each variable group and constructs a list of per-granule Xarray DataSets that contain ALL of the variables you've asked for. Then, it will merge them into a single DataSet. Future efforts will then extend default Xarray functionality to do ICESat-2 specific tasks like "get_strong_beams".

Please see the new ICESat-2_Data_Read-In_Example.ipynb, which explains the functionality in more detail as well as provides a quick start guide for using the new module.

I'm looking for any and all feedback on and contributions to the code, example, explanations, tests, etc. I've no doubt there will be lots of new bugs to fix as we move outside my super-small test case environment, and I'm eager to hear all of your thoughts. Please join the conversation - no detail is too small!

Review on ReviewNB: https://app.reviewnb.com/icesat2py/icepyx/pull/222/

Binder link:

Instructions to test:

cd icepyx/  # change working directory to local clone of icepyx repo
git fetch --all  # fetch all branches from remote icepyx repo
git switch --track upstream/dataobj  # or use git checkout --track origin/dataobj
git switch dataobj  # switch to the dataobj branch
pip install -e.  # install required dependencies of this branch, should pull in h5netcdf5 and so on

pip install jupyterlab  # if you haven't already
jupyter lab --no-browser  # launch jupyter lab to view ICESat-2_Data_Read-In_Example.ipynb

JessicaS11 · 2021-09-10T18:14:01Z

Note: the current Travis build failure should be fixed by #213.

weiji14

Thanks @JessicaS11 for all the work! Overall, I think the icepyx.Read class and Read.load() method is a great start. Really wish this was implemented 2 years ago for my PhD research, but anyways... there's a lot of suggested changes below, mostly minor stylistic changes for read.py at the moment. I've also added a binder link to your original comment at the top and some instructions for others on how to test things.

To make it easier to review the examples/ICESat-2_Data_Read-in_Example.ipynb though, could you consider using jupytext to turn the .ipynb file to a .py file? It will be easier to comment and make suggested changes on plaintext instead of a JSON. Steps would be:

pip install jupytext
jupytext --set-formats ipynb,py:percent examples/ICESat-2_Data_Read-in_Example.ipynb
# there should now be an ICESat-2_Data_Read-in_Example.py file
git add examples/ICESat-2_Data_Read-in_Example.py
git commit -m "Add jupytext converted ICESat-2_Data_Read-in_Example.py file"
git push

My other main concern is the data type of some of the coordinates of the output xarray.Dataset which are of object or binary dtypes when they could be str/int or datetime64.

<xarray.Dataset>
Dimensions:              (delta_time: 55654, gran_idx: 3, spot: 2)
Coordinates:
  * delta_time           (delta_time) datetime64[ns] 2019-02-22T01:06:07.5054...
  * gran_idx             (gran_idx) object '084902' '090202' '091002'
  * spot                 (spot) int64 2 5
    source_file          (gran_idx) <U95 '/home/username/Documents/github/icepyx...
    gt                   (gran_idx, spot) object 'gt3r' 'gt1l' ... 'gt3r' 'gt1l'
Data variables:
    sc_orient            (gran_idx) int8 0 0 0
    cycle_number         (gran_idx) int8 2 2 2
    rgt                  (gran_idx) int16 849 902 910
    atlas_sdp_gps_epoch  (gran_idx) datetime64[ns] 2018-01-01T00:00:18 ... 20...
    data_start_utc       (gran_idx) |S27 b'2019-02-22T01:03:44.199777Z' ... b...
    data_end_utc         (gran_idx) |S27 b'2019-02-22T01:07:38.112327Z' ... b...
    h_li                 (spot, gran_idx, delta_time) float32 nan nan ... nan
    latitude             (spot, gran_idx, delta_time) float64 nan nan ... nan
    longitude            (spot, gran_idx, delta_time) float64 nan nan ... nan
Attributes:
    data_product:  ATL06
    Description:   The land_ice_height group contains the primary set of deri...
    data_rate:     Data within this group are sparse.  Data values are provid...

Specifically, the gran_idx coordinate should preferrably have a str/int dtype (instead of object), while the data_start_utc and data_end_utc should preferrably have a datetime64 dtype (instead of S27/binary). I'll need to step through the code a bit more to see where this can be fixed, but maybe you have some ideas.

requirements.txt

icepyx/core/read.py

JessicaS11 · 2021-09-13T16:27:28Z

Thanks, as always, for the great feedback and edits @weiji14! They are so appreciated.

Making notebook review easier is on my long dev-to-do list, and was something (along with the dormant gallery work) I hoped to pick up again in the coming weeks. Thanks for adding the binder link in the meantime, and I'll keep moving forward with your suggestions for using jupytext.

Great point about fixing the type of the coordinates. There's a Read._build_dataset_template method that will probably be the best spot for the gran_idx. We'll have to see where in the read-in process works best to specify the type for the dates/times - I hadn't yet gotten to thinking about where/when in the read-in process we should think about handling date computations, but I know that we'll ultimately want to make it easy to calculate based on the epoch (would it be too costly to memory to just have this be one of the xarray extension functions?)...

icepyx/core/read.py

review-notebook-app · 2021-09-27T20:55:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…t pattern_as_path input needs

…st in source

…bles object on the read object

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

codecov-commenter · 2021-11-15T21:38:07Z

Codecov Report

Merging #222 (b57d247) into development (5b8112d) will decrease coverage by 3.83%.
The diff coverage is 29.93%.

@@               Coverage Diff               @@
##           development     #222      +/-   ##
===============================================
- Coverage        54.94%   51.11%   -3.84%     
===============================================
  Files               20       24       +4     
  Lines             1547     1839     +292     
  Branches           321      386      +65     
===============================================
+ Hits               850      940      +90     
- Misses             639      840     +201     
- Partials            58       59       +1

Impacted Files	Coverage Δ
icepyx/core/is2cat.py	`8.57% <8.57%> (ø)`
icepyx/core/is2ref.py	`23.52% <9.52%> (-8.53%)`	⬇️
icepyx/core/variables.py	`10.10% <16.66%> (+1.72%)`	⬆️
icepyx/core/read.py	`30.35% <30.35%> (ø)`
icepyx/tests/test_read.py	`95.45% <95.45%> (ø)`
icepyx/__init__.py	`100.00% <100.00%> (ø)`
icepyx/tests/test_variables.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b8112d...b57d247. Read the comment docs.

JessicaS11 · 2021-11-15T21:57:19Z

@icetianli @weiji14 What do you think? Can we merge?

weiji14

Hi @JessicaS11, I think we're just about ready 😄 The xarray.Dataset's coordinate data types look much better now:

<xarray.Dataset>
Dimensions:              (gran_idx: 1, spot: 2, delta_time: 25428)
Coordinates:
  * gran_idx             (gran_idx) uint64 83903
  * spot                 (spot) uint8 2 5
  * delta_time           (delta_time) datetime64[ns] 2019-05-23T05:39:00.8367...
    source_file          (gran_idx) <U41 './ATL06_20190523053429_08390310_004...
Data variables:
    sc_orient            (gran_idx) int8 0
    cycle_number         (gran_idx) int8 3
    rgt                  (gran_idx) int16 839
    h_li                 (spot, gran_idx, delta_time) float32 -52.24 ... nan
    latitude             (spot, gran_idx, delta_time) float64 -67.12 ... nan
    longitude            (spot, gran_idx, delta_time) float64 165.9 ... nan
    gt                   (gran_idx, spot) <U4 'gt3r' 'gt1l'
    atlas_sdp_gps_epoch  (gran_idx) datetime64[ns] 2018-01-01T00:00:18
    data_start_utc       (gran_idx) |S27 b'2019-05-23T05:39:00.477122Z'
    data_end_utc         (gran_idx) |S27 b'2019-05-23T05:42:10.569813Z'
Attributes:
    data_product:  ATL06
    Description:   The land_ice_height group contains the primary set of deri...
    data_rate:     Data within this group are sparse.  Data values are provid...

The data_start_utc and data_end_utc still seems to be a string type for some reason, didn't you fix that as mentioned in #222 (comment)? I'd see if that can be fixed quickly, otherwise just do it in a follow-up Pull Request since those two variables not that commonly used compared to delta_time.

Oh, and I did spot a few minor typos and an ImportError, but once those are fixed, I think this PR is in a good enough state to bring to the world 🚀

examples/ICESat-2_Data_Read-in_Example.ipynb

icepyx/core/is2cat.py

icepyx/core/read.py

examples/ICESat-2_Data_Read-in_Example.ipynb

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

JessicaS11 · 2021-11-23T21:41:11Z

The data_start_utc and data_end_utc still seems to be a string type for some reason, didn't you fix that as mentioned in #222 (comment)? I'd see if that can be fixed quickly, otherwise just do it in a follow-up Pull Request since those two variables not that commonly used compared to delta_time.

Good catch. I had, but numpy was issuing a warning, so I'd added a silencer for the warning, which apparently was making it not complete the try that was setting the type.

* add github action to add binder badge to PRs (#229) * use the binder badge action directly (instead of a manual implementation of it) (#233) See the discussion in #230 for more details on this switch. * preliminary AWS access (#213) * update links for travis badge (#234) * Fix failing test_visualization_date_range check for ATL07 (#241) * By default, no email status updates to users when ordering granules (#240) * remove extra cell causing errors in example notebook * Set default page size for orders to 2000 per NSIDC recommendation (#239) * Add ICESat-2 data read-in functionality (#222) * update examples from 2020 Hackweek tutorials * update add and commit GitHub Action version (and UML diagrams) (#244) * merge traffic (GitHub and PyPI) and bib updates (#245) * Release 0.5.0 (#246) * release v0.5.0 CI fixes (#251) * fix Travis CI label in readme * update earthdata login fixture for testing * add required input to pytest fixture (#252) Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> Co-authored-by: trey-stafford <trey.stafford@colorado.edu>

JessicaS11 requested review from icetianli, tsutterley and weiji14 September 10, 2021 18:14

JessicaS11 mentioned this pull request Sep 10, 2021

Get filepaths to downloaded hdf5 granule files #59

Closed

JessicaS11 linked an issue Sep 10, 2021 that may be closed by this pull request

An intake catalog for ICESat-2 ATLAS data #106

Closed

weiji14 added enhancement New feature or request longer_contribution Issues that will take more than an afternoon to implement labels Sep 10, 2021

weiji14 reviewed Sep 11, 2021

View reviewed changes

weiji14 self-assigned this Sep 27, 2021

JessicaS11 commented Sep 27, 2021

View reviewed changes

icepyx/core/read.py Show resolved Hide resolved

JessicaS11 force-pushed the development branch from 130528b to 247725c Compare September 30, 2021 15:39

JessicaS11 force-pushed the dataobj branch from c04bfbb to d859f28 Compare September 30, 2021 17:46

JessicaS11 added 15 commits September 30, 2021 13:50

add read class module and basic outline

ff7f807

start input check

38a33d2

add more infrastructure to read class and needed intake dependencies

a478825

write file/dir and path matching functions. Still in progress for ini…

7acde07

…t pattern_as_path input needs

outline basic tests file for read module validation functions

78ef804

add function to create glob path from pattern

cf42fe9

iterate over directories and confirm files with specified pattern exi…

581aee3

…st in source

add basic build catalog function to read data object

5edcce1

start turning notebook into example; add docstring to read.build_catalog

df4fef3

begin debugging save-load and var_path_params issues

4af16dc

add rough example notebook for data read in

f4f95a2

debug var_path_params entry to build_catalog

667f817

separate intake catalog functions to another module

552fd3f

add ability to get list of variables from a file and create the varia…

fcae6d6

…bles object on the read object

commit ongoing changes so can merge dev updates

eee1f2e

JessicaS11 and others added 10 commits October 26, 2021 15:55

fix typos

3ba8220

use glob_path instead of glob

1b06422

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

fix or silence xarray deprecation warnings

db253da

make imports alphebetical

c96b1dd

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

specify unsigned int for some dimensions

377ca1d

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

re-cast dtypes after merge; turn gt into a variable

9ce555c

add explicit parameter name

68b9fa3

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

some code and notebook cleanup

45cb1d1

Merge branch 'development' into dataobj

c2a9a48

Merge branch 'development' into dataobj

e800ba3

fix example notebook error

714d065

JessicaS11 requested review from weiji14 and icetianli November 15, 2021 21:57

weiji14 approved these changes Nov 16, 2021

View reviewed changes

JessicaS11 and others added 7 commits November 23, 2021 15:41

Apply suggestions from code review

a4ec72d

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update examples/ICESat-2_Data_Read-in_Example.ipynb

2eaba79

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update examples/ICESat-2_Data_Read-in_Example.ipynb

7f71117

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update examples/ICESat-2_Data_Read-in_Example.ipynb

ad2968d

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update examples/ICESat-2_Data_Read-in_Example.ipynb

6672910

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update examples/ICESat-2_Data_Read-in_Example.ipynb

2ac4e2e

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

fix last few typos and issues

745e24e

add sorting to list comparisons

b57d247

JessicaS11 merged commit 63feb76 into development Nov 23, 2021

JessicaS11 deleted the dataobj branch November 23, 2021 22:14

weiji14 mentioned this pull request Aug 30, 2023

Remove intake catalog from Read module #438

Merged

2 tasks

weiji14 mentioned this pull request Nov 20, 2023

Update read module coordinate dimension manipulations to use new xarray index #473

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ICESat-2 data read-in functionality #222

Add ICESat-2 data read-in functionality #222

JessicaS11 commented Sep 10, 2021 •

edited by weiji14

Loading

JessicaS11 commented Sep 10, 2021

weiji14 left a comment

JessicaS11 commented Sep 13, 2021

review-notebook-app bot commented Sep 27, 2021

codecov-commenter commented Nov 15, 2021 •

edited

Loading

JessicaS11 commented Nov 15, 2021

weiji14 left a comment

JessicaS11 commented Nov 23, 2021

Add ICESat-2 data read-in functionality #222

Add ICESat-2 data read-in functionality #222

Conversation

JessicaS11 commented Sep 10, 2021 • edited by weiji14 Loading

JessicaS11 commented Sep 10, 2021

weiji14 left a comment

Choose a reason for hiding this comment

JessicaS11 commented Sep 13, 2021

review-notebook-app bot commented Sep 27, 2021

codecov-commenter commented Nov 15, 2021 • edited Loading

Codecov Report

JessicaS11 commented Nov 15, 2021

weiji14 left a comment

Choose a reason for hiding this comment

JessicaS11 commented Nov 23, 2021

JessicaS11 commented Sep 10, 2021 •

edited by weiji14

Loading

codecov-commenter commented Nov 15, 2021 •

edited

Loading