Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get filepaths to downloaded hdf5 granule files #59

Closed
wants to merge 36 commits into from
Closed

Get filepaths to downloaded hdf5 granule files #59

wants to merge 36 commits into from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Jun 4, 2020

Before h5py, h5netcdf, xarray, and other readers can open the HDF5 files are, we'll need to figure out where the filepaths to actually are! This is the first step in resolving #35. Users can access a filepaths property in the Icesat2Data.granules class after they've downloaded the granules (or they can set the Icesat2data.granules.filepaths property if they've downloaded the files already).

region_a = icepyx.icesat2data.Icesat2Data(
    dataset="ATL06",
    spatial_extent=[-55, 68, -48, 71],
    date_range=["2019-02-20", "2019-02-28"],
    version="003",
)
region_a.earthdata_login(uid=uid, email=email)

region_a.download_granules(path="icepyx_data/")
filepaths = region_a.granules.filepaths
print(filepaths)

gives:

['icepyx_data/5000000123456/processed_ATL06_20190221121851_08410203_003_01.h5',
 'icepyx_data/5000000123456/processed_ATL06_20190225121032_09020203_003_01.h5',
 'icepyx_data/5000000123456/processed_ATL06_20190222010344_08490205_003_01.h5',
 'icepyx_data/5000000123456/processed_ATL06_20190226005526_09100205_003_01.h5']

Users can then load the data files from the filepaths, e.g. as follows:

# h5py
import h5py

for file in filepaths:
    h5file = h5py.File(file, mode="r")
    gt2l = h5file["gt2l"]["land_ice_segments"]

# xarray
import xarray as xr

gt2l: xr.Dataset = xr.open_mfdataset(
    paths=filepaths,
    engine="h5netcdf",
    group="gt2l/land_ice_segments",
    combine="by_coords",
)

I can write unit tests for this if someone finds a way to securely authenticate to the NSIDC servers without putting my uid/password up 😆 Done: Unit tests written!

Copy link
Member

@JessicaS11 JessicaS11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the additions you've made here, but I have concerns about how modular/flexible the filepath list would be given that it is explicitly tied to the download process (and thus cannot be used for locally available files that have already been downloaded). I can think of a few different ways to approach this that remedy that, but overall I think it would be better to have filepaths be a property of the granules class that is filled in during the download process (if a download occurs) OR can be filled in during initialization (of the icesat2data object, which would consequently initialize a granules object). In either case you would access it using region_a.granules.filepaths, similar to what you've included. This also removes the need for a login/authentication in order to write a unit test (unit tests for new functions are encouraged and would be super appreciated - thanks for offering! As an aside, thinking about how to unit test things has done wonders for my code-writing, given the authentication issue you note. Our current approach is to have as many functions as possible be independent of a login, and then we test the login independently using a secure key in Travis CI).

Would you be able to refactor this with this flexibility in mind? Apologies for not getting you on the dev team list yesterday!

@weiji14
Copy link
Member Author

weiji14 commented Jun 4, 2020

I think it would be better to have filepaths be a property of the granules class that is filled in during the download process (if a download occurs)

Ok done, but I'll probably need to refactor it later to account for the items below 👇

..., but I have concerns about how modular/flexible the filepath list would be given that it is explicitly tied to the download process (and thus cannot be used for locally available files that have already been downloaded).
...
This also removes the need for a login/authentication in order to write a unit test (unit tests for new functions are encouraged and would be super appreciated - thanks for offering!

I'll need to have a rethink about how to do this. I see you've added some new tests for granules.py (which is great!), and there was a line obs_grans = [gran['producer_granule_id'] for gran in reg_a.granules] that seems to get a list of would-be downloaded files (though I don't see how reg_a.granules is iterable, Edit: opened #61 to fix this). Would that be the way to go? I.e. get the filepaths before download starts, and test for that. Will need set a reg_a.granules.download_path property perhaps.

As an aside, thinking about how to unit test things has done wonders for my code-writing, given the authentication issue you note. Our current approach is to have as many functions as possible be independent of a login, and then we test the login independently using a secure key in Travis CI).

Wait until you start writing integration tests 😆 But yes, good to have unit tests be as independent as possible.

@weiji14 weiji14 marked this pull request as draft June 7, 2020 03:00
JessicaS11 and others added 8 commits June 24, 2020 13:13
update documentation and readme with clearer install instructions (including for windows users)
Update codecov badge to point to 'development' branch
Fix test failures in test_granules.py
mac OS upgrades can lead to machines unable to get IP from hostname.  this is a fix to revert to using 'localhost' in case of a gaierror from socket.gethostbyname
* add black pre-commit hook and reformat files using black

* add github action workflow with flake8 on PRs
* moved examples to top level dir and updated docs links

* grammar edits

* add pypi deployment to travis

* add notes about development branch to contribution_guidelines

* migrate from travis-ci.org to travis-ci.com and start using Travis-CI GitHub App to do CI checks (because webhook from travis-ci.org wasn't reporting the status back to GH, making it challenging to merge some PRs given the branch protection rules).
@weiji14 weiji14 changed the base branch from master to development July 16, 2020 09:46
Do so via a `download_path` property that can be set by the user, either before or after a download has occured. The `filepaths` property would then return the full paths to each of the granule files for loading into h5py or xarray.
Copy link

@fspaolo fspaolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is being handled here:

from pathlib import Path

and

# NOTE: This is just a (quick and dirty) test for the data object

We are still discussing the best strategy for interoperability between modules that may require some of the same methods (e.g. query and data). Our next Team Meeting would be a good place to discuss these ideas before officially implement them.

@weiji14
Copy link
Member Author

weiji14 commented Jul 16, 2020

Sorry, must have missed that branch! Probably should refactor and port the unit tests here over to there, but I'll take a closer look first and probably come to the next meeting (if my timezone works out) to see where things are at for icepyx.

@JessicaS11
Copy link
Member

@weiji14 Browsing through some of these older/open PRs (I was remembering that this one was waiting for read-in functionality consideration, which #222 does). Updated thoughts on this (and if we want to pursue any of it, could you update the base so we can see where the differences are more easily)?

@weiji14
Copy link
Member Author

weiji14 commented Mar 12, 2022

Closing this as it's not really relevant anymore, especially with the new icepyx.Read module at https://github.com/icesat2py/icepyx/blob/v0.6.1/icepyx/core/read.py

@weiji14 weiji14 closed this Mar 12, 2022
@weiji14 weiji14 deleted the get_granule_filepaths branch March 12, 2022 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants