Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start Harmony-py subset example with PO.DAAC data #187

Merged
merged 11 commits into from
May 23, 2023
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,3 @@ At our 2nd hackday we had a brief overview of cookbook progress since we\'d last

- [Cookbook GitHub Issues](https://github.com/NASA-Openscapes/earthdata-cloud-cookbook/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) - began organizing ideas in Issues

##
120 changes: 117 additions & 3 deletions how-tos/subset.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,124 @@
---
title: How do I subset data granules?
execute:
eval: false
---

How do I subset a data granule using Harmony?
How do I subset an OPeNDAP granule in the cloud?
How do I subset a data granule using xarray?
## How do I subset a data granule using Harmony?

::: {.panel-tabset group="language"}

## Python

Install the [harmony-py]("https://github.com/nasa/harmony-py") package:

```{bash}
# Install harmony-py
pip install -U harmony-py
```

Import packages:

```{python}
import datetime as dt

from harmony import BBox, Client, Collection, Request, LinkType

import s3fs
import xarray as xr
```

### Set up Harmony client and authentication

We will authenticate the following Harmony request using a netrc file. See the [appendix]("https://nasa-openscapes.github.io/earthdata-cloud-cookbook/appendix/authentication.html") for more information on [Earthdata Login]("https://urs.earthdata.nasa.gov/") and netrc setup. This basic line below to create a Harmony Client assumes that we have a .netrc available.

```{python}
harmony_client = Client()
```
### Create and submit Harmony request

We are interested in the [GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis dataset](https://doi.org/10.5067/GHGMR-4FJ04). We are subsetting over the Pacific Ocean to the west of Mexico during 1:00 - 2:00 on 10 March 2021. The dataset is organized into daily files, so while we are specifying a single hour in our request, this will return that full day's worth of data.

```{python}
dataset_short_name = 'MUR-JPL-L4-GLOB-v4.1'

request = Request(
collection=Collection(id=dataset_short_name),
spatial=BBox(-125.469,15.820,-99.453,35.859),
temporal={
'start': dt.datetime(2021, 3, 10, 1),
'stop': dt.datetime(2021, 3, 10, 2)
}
)

job_id = harmony_client.submit(request)

harmony_client.wait_for_processing(job_id)
```
### Open and read the subsetted file in `xarray`

Harmony data outputs can be accessed within the cloud using the s3 URLs and AWS credentials provided in the Harmony job response. Using `aws_credentials` we can retrieve the credentials needed to access the Harmony s3 staging bucket and its contents. We then use the AWS `s3fs` package to create a file system that can then be read by xarray.

```{python}
results = harmony_client.result_urls(job_id, link_type=LinkType.s3)
urls = list(results)
url = urls[0]

creds = harmony_client.aws_credentials()

s3_fs = s3fs.S3FileSystem(
key=creds['aws_access_key_id'],
secret=creds['aws_secret_access_key'],
token=creds['aws_session_token'],
client_kwargs={'region_name':'us-west-2'},
)

f = s3_fs.open(url, mode='rb')
ds = xr.open_dataset(f)
ds
```
### Plot data

Use the xarray built in plotting function to create a simple plot along the x and y dimensions of the dataset:

```{python}
ds.analysed_sst.plot() ;
```

## R

R code coming soon!

```{r}
# Coming soon!
```

## Matlab

Matlab code coming soon!

```{bash}
#| echo: true
# Coming soon!
```

## Command Line

With `wget` and `curl`:

```{bash}
# Coming soon!
```



:::

## How do I subset an OPeNDAP granule in the cloud?


## How do I subset a data granule using xarray?


## How do I download a subset of NetCDF-4?
*this might be a deprecated idea*