Skip to content

Commit

Permalink
Merge pull request #4 from leap-stc/test_github_action
Browse files Browse the repository at this point in the history
Baseline test for deploy-action and CMIP6 recipe
  • Loading branch information
jbusecke authored Aug 7, 2023
2 parents b52209b + 03288a5 commit 919ded9
Show file tree
Hide file tree
Showing 8 changed files with 161 additions and 194 deletions.
25 changes: 3 additions & 22 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,28 +22,9 @@ jobs:
with:
credentials_json: "${{ secrets.GCP_DATAFLOW_SERVICE_KEY }}"
- name: "Deploy recipes"
uses: "pangeo-forge/deploy-recipe-action@v0.1"
uses: "pangeo-forge/deploy-recipe-action@file-based-config"
with:
select_recipe_by_label: false
pangeo_forge_runner_config: >
{
"Bake": {
"bakery_class": "pangeo_forge_runner.bakery.dataflow.DataflowBakery"
},
"DataflowBakery": {
"use_public_ips": true,
"service_account_email": "julius-leap-dataflow@leap-pangeo.iam.gserviceaccount.com",
"project_id": "leap-pangeo",
"temp_gcs_location": "gs://leap-scratch/data-library/temp"
},
"TargetStorage": {
"fsspec_class": "gcsfs.GCSFileSystem",
"root_path": "gs://leap-persistent-ro/data-library/cmip6-testing/{job_name}"
},
"InputCacheStorage": {
"fsspec_class": "gcsfs.GCSFileSystem",
"root_path": "gs://leap-scratch/data-library/cache"
}
}
select_recipe_by_label: true
pangeo_forge_runner_config: "./configs/config-pgf-runner-leap-dataflow.json"
env:
GOOGLE_APPLICATION_CREDENTIALS: "${{ steps.auth.outputs.credentials_file_path }}"
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

This repository is similar to [data-management](https://github.com/leap-stc/data-management) but due to the sheer size of the CMIP archive, we chose to keep this feedstock separate to enable custom solutions and fast development not necessary for other data ingestion recipes.

## How to run recipes locally (with PGF runner)
- Make sure to set up the environment (TODO: Add this as docs on pangeo-forge-runner)
- Create a scratch dir (e.g. on the desktop it should not be within a repo)
- call pfg with a local path `pangeo-forge-runner bake --repo path_to_repo -f path_to_config.json`
- data will be generated in this (scratch) dir.
> Example call: `pangeo-forge-runner bake --repo=/Users/juliusbusecke/Code/CMIP6-LEAP-feedstock -f /Users/juliusbusecke/Code/CMIP6-LEAP-feedstock/configs/config_local.json --Bake.job_name=cmip6test`
- TODO: In pgf-runner error if all the storage locations are not just an abstract filestystem
- From charles: install pgf recipes locally with editable flag
- Get a debugger running within the pgf code (TODO: ask charles again how to do ti.
)

## Dev Guide

- Set up a local conda environment with `mamba env create -f environment.yml`
Expand Down
20 changes: 20 additions & 0 deletions configs/config-pgf-runner-leap-dataflow.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"Bake": {
"bakery_class": "pangeo_forge_runner.bakery.dataflow.DataflowBakery"
},
"DataflowBakery": {
"use_public_ips": true,
"service_account_email": "julius-leap-dataflow@leap-pangeo.iam.gserviceaccount.com",
"project_id": "leap-pangeo",
"temp_gcs_location": "gs://leap-scratch/data-library/temp",
"machine_type": "e2-highmem-8"
},
"TargetStorage": {
"fsspec_class": "gcsfs.GCSFileSystem",
"root_path": "gs://leap-persistent-ro/data-library/cmip6-testing/{job_name}"
},
"InputCacheStorage": {
"fsspec_class": "gcsfs.GCSFileSystem",
"root_path": "gs://leap-scratch/data-library/cache"
}
}
17 changes: 17 additions & 0 deletions configs/config_local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"Bake": {
"prune": true,
"bakery_class": "pangeo_forge_runner.bakery.local.LocalDirectBakery"
},
"LocalDirectBakery": {
"num_workers": 0
},
"TargetStorage": {
"fsspec_class": "fsspec.implementations.local.LocalFileSystem",
"root_path": "local_storage/target/"
},
"InputCacheStorage": {
"fsspec_class": "fsspec.implementations.local.LocalFileSystem",
"root_path": "local_storage/cache/"
}
}
7 changes: 3 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,12 @@ name: cmip6-leap-feedstock
channels:
- conda-forge
dependencies:
- python=3.9
- python=3.9.13 # see https://github.com/pangeo-forge/pangeo-forge-runner/issues/78
- ipykernel
- gcsfs
- fsspec
- apache-beam-with-gcp==2.42.0
- pip:
# - apache_beam[interactive, gcp, dataframe]==2.42.0
- google-cloud-bigquery
# - git+https://github.com/pangeo-forge/pangeo-forge-runner.git@arbitrary-injections # need to install this with no-deps
# - -r feedstock/requirements.txt
- git+https://github.com/pangeo-forge/pangeo-forge-runner.git@main
- git+https://github.com/pangeo-forge/pangeo-forge-recipes.git@main
12 changes: 7 additions & 5 deletions feedstock/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
title: "CMIP6"
description: "CMIP6 datasets converted to zarr stores from ESGF files"
description: "CMIP6 datasets converted to zarr stores from ESGF files"
recipes:
- id: cmip6-template
object: "recipe:transforms"
- id: cmip6-template-a
object: "recipe:template_a"
- id: cmip6-template-b
object: "recipe:template_b"
provenance:
providers: #TODO: Ask in the cloud call...
- name: "ESGF"
- name: "ESGF"
description: "Earth System Grid Federation"
roles:
- producer
Expand All @@ -20,4 +22,4 @@ maintainers:
orcid: "0000-0002-4078-0852"
github: cisaacstern
# bakery:
# id: "pangeo-ldeo-nsf-earthcube"
# id: "pangeo-ldeo-nsf-earthcube"
Loading

0 comments on commit 919ded9

Please sign in to comment.