Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of noaa-oisst recipe #20

Merged
merged 14 commits into from
May 18, 2021
6 changes: 6 additions & 0 deletions .github/actions/process_recipe/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM pangeo/pangeo-notebook:latest

COPY action/process_recipe.py /process_recipe.py
COPY entrypoint.sh /entrypoint.sh

ENTRYPOINT [ "sh", "/entrypoint.sh" ]
34 changes: 34 additions & 0 deletions .github/actions/process_recipe/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Process Recipe Action

This actions purpose is to take a Recipe contributed via a PR, convert it to a Prefect Flow, register it with the Bakery specified in its metadata file, and run a test invocation of the Flow with a subset of the input data.

## Inputs

### `path_to_recipe_py`

**Required** The path to the `recipe.py` file within the PR. This is relative to the root of the repository.

### `path_to_meta_yaml`

**Required** The path to the `meta.yaml` file within the PR. This is relative to the root of the repository.


## Outputs

N/A

## Example usage

```yaml
# If using this recipe within the pangeo-forge/staged-recipes repository
uses: ./.github/actions/process_recipe
with:
path_to_recipe_py: "recipes/my_recipe/recipe.py"
path_to_meta_yaml: "recipes/my_recipe/meta.yaml"

# If using this recipe in any other repository
uses: pangeo-forge/staged-recipes/.github/actions/process_recipe@master
with:
path_to_recipe_py: "recipes/my_recipe/recipe.py"
path_to_meta_yaml: "recipes/my_recipe/meta.yaml"
```
19 changes: 19 additions & 0 deletions .github/actions/process_recipe/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: 'Process Recipe'
description: '''
For a given recipe.py and its meta.yaml file: convert it to a Prefect Flow,
register the Flow with the specified Bakery in the meta.yml, and test a pruned run
of the Flow on the Bakery.
'''
inputs:
path_to_recipe_py:
description: 'Path to the recipe.py file of the recipe to process'
required: true
path_to_meta_yaml:
description: 'Path to the meta.yaml file of the recipe to process'
required: true
runs:
using: 'docker'
image: 'Dockerfile'
args:
- ${{ inputs.path_to_recipe_py }}
- ${{ inputs.path_to_meta_yaml }}
7 changes: 7 additions & 0 deletions .github/actions/process_recipe/action/process_recipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import sys

def main():
print(f"recipe.py: {sys.argv[1]} meta.yaml: {sys.argv[2]}")

if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions .github/actions/process_recipe/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/sh -l

# Process dependencies here that need installing - Assuming that the recipe either has:
# - Conda environment .yaml file attached
# - Dependencies in a list in the meta.yaml

python3 /process_recipe.py $1 $2
46 changes: 46 additions & 0 deletions .github/workflows/process-recipe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Process PR Recipe

on:
pull_request:
branches: master

jobs:
process-recipe:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2

- id: changed-files
name: Get the files changed in the Pull Request
uses: jitterbit/get-changed-files@v1

- name: Locate Pull Requests recipe.py
id: recipe
run: |
for changed_file in ${{ steps.changed-files.outputs.all }}; do
if [[ ${changed_file} =~ ^.*recipes\/.*recipe\.py$ ]];
then
echo "::set-output name=RECIPE_PY::${changed_file}";
exit 0
fi
done
exit 1 # Fail - recipe.py required

- name: Locate Pull Requests meta.yaml
id: meta
run: |
for changed_file in ${{ steps.changed-files.outputs.all }}; do
if [[ ${changed_file} =~ ^.*recipes\/.*meta\.(yml|yaml)$ ]];
then
echo "::set-output name=META_YAML::${changed_file}";
exit 0
fi
done
exit 1 # Fail - meta.yaml required

- name: Process Pull Requests recipe
if: ${{ steps.recipe.outputs.RECIPE_PY != null && steps.meta.outputs.META_YAML != null }}
uses: ./.github/actions/process_recipe
with:
path_to_recipe_py: ${{ steps.recipe.outputs.RECIPE_PY }}
path_to_meta_yaml: ${{ steps.meta.outputs.META_YAML }}
22 changes: 22 additions & 0 deletions recipes/noaa-oisst/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
title: "NOAA Optimum Interpolated SST"
description: "Analysis-ready Zarr datasets derived from NOAA OISST NetCDF"
pangeo_forge_version: "0.0.1" # do we need a separate spec version
recipes:
- id: noaa-oisst-avhrr-only
module: recipe.py # the module where to find the recipe
name: recipe # the name of the object to import
provenance:
providers:
- name: "NOAA NCEI"
description: "National Oceanographic & Atmospheric Administration National Centers for Environmental Information"
roles:
- producer
- licensor
url: https://www.ncdc.noaa.gov/oisst
license: "CC-BY-4.0"
maintainers:
- name: "Ryan Abernathey"
orcid: "0000-0001-5999-4917"
github: rabernat
bakeries:
- id: "pangeo-aws-west-1" # must come from a valid list of bakeries
20 changes: 20 additions & 0 deletions recipes/noaa-oisst/recipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import pandas as pd
from pangeo_forge.recipe import NetCDFtoZarrSequentialRecipe

input_url_pattern = (
"https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation"
"/v2.1/access/avhrr/{yyyymm}/oisst-avhrr-v02r01.{yyyymmdd}.nc"
)
dates = pd.date_range("1981-09-01", "2021-01-05", freq="D")
input_urls = [
input_url_pattern.format(
yyyymm=day.strftime("%Y%m"), yyyymmdd=day.strftime("%Y%m%d")
)
for day in dates
]

recipe = NetCDFtoZarrSequentialRecipe(
rabernat marked this conversation as resolved.
Show resolved Hide resolved
input_urls=input_urls,
sequence_dim="time",
inputs_per_chunk=20
)