Updated example #3

abarciauskas-bgse · 2021-02-15T01:33:53Z

No description provided.

abarciauskas-bgse · 2021-02-15T01:35:05Z

@TomAugspurger @CiaranEvans Could you review this PR? I can't apparently assign reviewers in this repo.

ciaransweet

Mainly small change suggestions @abarciauskas-bgse

I wonder if we need to look at further abstracting some stuff in here behind some more pangeo-forge provided functionality, there's still a fair bit of knowledge needed for this kind of Python work..

README.md

ciaransweet · 2021-02-15T10:52:54Z

README.md

+```bash
+conda env create -f=envirnoment.yml
+conda activate pangeo-pipeline
+python recipes/pipeline.py


Suggested change

python recipes/pipeline.py

python recipe/pipeline.py

README.md

ciaransweet · 2021-02-15T10:56:18Z

README.md

+Configure S3 Access:
+
+```bash
+cp recipe/config.yml.example config.yml


I can't see recipe/config.yml.example in this branch

ciaransweet · 2021-02-15T10:57:33Z

README.md

+
+## Github Workflows
+
+NOTE: The scripts and workflows in `.github/` are out of date with the current example code.


Putting on my pragmatic hat, if they're out of date, why don't we just remove them?

I thought about that but I also thought github workflows are something that should be a part of this repo (if we decide to keep this repo)

That's okay, but we can put them in if we need them, the benefit of this statement is minimal I'd argue (and the inclusion of stuff that's not being used).

We shouldn't be afraid of deleting stuff, that's what git's for in the end of the day 🥳

ciaransweet · 2021-02-15T11:01:07Z

recipe/pipeline.py

+import argparse
+from fsspec.implementations.local import LocalFileSystem
+import os
 import pandas as pd
-import pangeo_forge
-import pangeo_forge.utils
-from pangeo_forge.tasks.http import download
-from pangeo_forge.tasks.xarray import combine_and_write
-from pangeo_forge.tasks.zarr import consolidate_metadata
-from prefect import Flow, Parameter, task, unmapped
-
-# We use Prefect to manage pipelines. In this pipeline we'll see
-# * Tasks: https://docs.prefect.io/core/concepts/tasks.html
-# * Flows: https://docs.prefect.io/core/concepts/flows.html
-# * Parameters: https://docs.prefect.io/core/concepts/parameters.html
-
-# A Task is one step in your pipeline. The `source_url` takes a day
-# like '2020-01-01' and returns the URL of the raw data.
-
-
-@task
+from pangeo_forge.recipe import NetCDFtoZarrSequentialRecipe
+from pangeo_forge.storage import CacheFSSpecTarget, FSSpecTarget
+from pangeo_forge.executors import PythonPipelineExecutor, PrefectPipelineExecutor
+import prefect
+from prefect import task, Flow
+import shutil
+import s3fs
+import tempfile
+import yaml


What are we going with on linting/codestyles? AFAIK this isn't PEP8 compliant (which is being really picky, but I think we should just use something opinionated like flake8, black and isort)

PEP8 imports

Imports should be grouped in the following order: Standard library imports. Related third party imports. Local application/library specific imports. You should put a blank line between each group of imports.

The recipe is linted when it goes through staged-recipes at https://github.com/pangeo-forge/staged-recipes/blob/0d6fa51a1189a7bdb1927e7987c7a74216d13ab2/.pre-commit-config.yaml.

I'd recommend copying that (or a similar) pre-commit config over when creating a new repository.

I'd argue linting then is a bit too late, I'd rather we upheld formatting across all our repos as they are, rather than transforming them when merged into others

It just makes everything look uniform then

Just to clarify: recipes go through staged-recipes first, and then the GitHub repository is created for that recipe. So the linting will happen there first.

Sure, that's fine, but if we're using this as an example, we should probably still lint/format this.

Right:

I'd recommend copying that (or a similar) pre-commit config over when creating a new repository.

ciaransweet · 2021-02-15T11:07:11Z

recipe/pipeline.py

+this_dir = os.path.dirname(os.path.abspath(__file__))
+
+if args.storage == 's3':
+    # Read the config file


Usually if there's a comment like this saying <Verb> <something> I prefer to take out the following block into a function called that, but this is an open comment haha

ciaransweet · 2021-02-15T11:09:25Z

recipe/pipeline.py

+
+if args.execution_env == 'prefect':
+    executor = PrefectPipelineExecutor()
+    print(executor)


Is this print needed?

ciaransweet · 2021-02-15T11:09:47Z

recipe/pipeline.py

+    executor = PrefectPipelineExecutor()
+    print(executor)
+    plan = executor.pipelines_to_plan(pipeline)
+    # The 'plan' is a prefect.Flow


IMO it's helpful to know that while the plan is a return from a pangeo_forge module, the return object is a prefect module (and so you can operate on it with prefect functions).

Wont most editors/IDEs give you those prompts though?

ciaransweet · 2021-02-15T11:11:27Z

README.md

+
+Pre-requisites:
+* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)
+


Do they need prefect installed? I know that this section is for local running but this will require prefect if the user passes that argument

ciaransweet · 2021-02-15T11:16:05Z

recipe/pipeline.py

+parser.add_argument('--storage', help='Optional argument to store data remotely, such as on AWS')
+args = parser.parse_args()
+
+# NOAA SST Specific Functions for generating a list of URLs
 def source_url(day: str) -> str:


I'd prefer this to be refactored to be named get_source_url

recipe/pipeline.py

rabernat · 2021-02-15T12:19:09Z

It's great to see an update to this! Thanks @abarciauskas-bgse! I really appreciate your efforts. 😄

IMO, it may be a tiny bit premature early to be updating the recipe. We need to decide what ingredients a recipe repo should contain (see pangeo-forge/pangeo-forge-recipes#71). In my [very possibly wrong] opinion, the recipe should just contain a bare minimum. It should not

assign storage targets (this will be done by the bakery)
contain execution code (this should be done by a github action we have yet to write)

My draft PR for a new recipe (see pangeo-forge/staged-recipes#20) tries to follow this bare-bones approach.

In any case, let's use this as a discussion case at our meeting today.

Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>

TomAugspurger

Thanks for working on this. A few comments.

TomAugspurger · 2021-02-15T03:08:15Z

recipe/pipeline.py

+import tempfile
+import yaml
+
+parser = argparse.ArgumentParser()


IMO, this type of stuff will go in the pangeo-forge CLI. So something like pangeo-forge execute recipe --execution-env=... --storage=....

pangeo-forge/pangeo-forge-recipes#43 is the tracking issue for that I think.

TomAugspurger · 2021-02-15T03:11:51Z

recipe/pipeline.py

+)
+this_dir = os.path.dirname(os.path.abspath(__file__))
+
+if args.storage == 's3':


My hope is that we can rely on fsspec to completely abstract away the file system: https://pangeo-forge.readthedocs.io/en/latest/recipes.html?highlight=fsspec#storage.

So this could be something like

# CLI $ pangeo-forge execute recipe --storage=file:///my-path $ pangeo-forge execute recipe --storage=s3://my-bucket/my-path

And then in the recipe we do

fs = fsspec.get_filesystem_class(args.storage)()

TomAugspurger · 2021-02-15T03:12:53Z

README.md

+## Run the example on Prefect Cloud
+
+Pre-requisites:
+* Create an account on cloud.prefect.io


Do you know if a prefect cloud account is required for executing this locally with the prefect executor?

I don't think so

TomAugspurger · 2021-02-15T16:42:33Z

Another open question: do we want to modify this repo in place? Or should we delete it and regenerate it with the tooling to generate repositories when merged into staged-recipes? (If we do delete it, I think we should continue reviewing here to figure out what we want the output repository look like)

abarciauskas-bgse · 2021-02-15T17:42:49Z

Thanks all for reviewing, changes were very helpful.

I'm happy to discuss our approach to examples during the meeting today.

abarciauskas-bgse · 2021-02-16T18:46:21Z

@rabernat @TomAugspurger given the discussion yesterday and pangeo-forge/staged-recipes#20, do we still want an example pipeline or example recipe repository?

TomAugspurger · 2021-02-16T20:13:09Z

IMO, the order should be:

Make Draft of noaa-oisst recipe staged-recipes#20 runnable with the new base pangeo_forge classes & bakery infrastructure.
Update https://github.com/pangeo-forge/staged-recipes/tree/master/recipes/example to use the new base classes.
Delete and regenerate this repository based on https://github.com/pangeo-forge/staged-recipes/blob/master/.github/workflows/create-repository.yaml.

I think that gets us closes to and end-to-end workflow from new recipe to something running on the bakeries.

So, I'd say hold off on further effort here for now.

abarciauskas-bgse · 2022-10-25T16:40:45Z

closing as stale

abarciauskas-bgse added 4 commits February 13, 2021 13:56

Update example, just for python / local run

d0847d0

Working on updated example

c9a0155

Configurable example (python or prefect)

121abea

Update README and run flow

8764b45

ciaransweet suggested changes Feb 15, 2021

View reviewed changes

abarciauskas-bgse and others added 3 commits February 15, 2021 08:36

Update recipe/pipeline.py

fe7137a

Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>

Update README.md

b30cba4

Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>

Add link

1de630c

Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>

TomAugspurger reviewed Feb 15, 2021

View reviewed changes

abarciauskas-bgse added 4 commits February 15, 2021 08:50

Fix command and add config.yml.example

2305353

Require prefect in environment.yml

4e667f1

Remove comment/print and update function name

f983e3e

Group imports

ab8354e

abarciauskas-bgse marked this pull request as draft February 15, 2021 16:55

abarciauskas-bgse added 2 commits February 15, 2021 09:37

Updates to use fsspec.get_filesystem_class

b8c271c

Remove unnecessary comment

d9cd44a

abarciauskas-bgse closed this Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated example #3

Updated example #3

abarciauskas-bgse commented Feb 15, 2021

abarciauskas-bgse commented Feb 15, 2021

ciaransweet left a comment

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

abarciauskas-bgse Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

TomAugspurger Feb 15, 2021

ciaransweet Feb 15, 2021

TomAugspurger Feb 15, 2021

ciaransweet Feb 15, 2021

TomAugspurger Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

abarciauskas-bgse Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

ciaransweet Feb 15, 2021

rabernat commented Feb 15, 2021

TomAugspurger left a comment

TomAugspurger Feb 15, 2021

TomAugspurger Feb 15, 2021

TomAugspurger Feb 15, 2021

abarciauskas-bgse Feb 15, 2021

TomAugspurger commented Feb 15, 2021

abarciauskas-bgse commented Feb 15, 2021

abarciauskas-bgse commented Feb 16, 2021

TomAugspurger commented Feb 16, 2021 •

edited

Loading

abarciauskas-bgse commented Oct 25, 2022


		## Github Workflows

		NOTE: The scripts and workflows in `.github/` are out of date with the current example code.


		Pre-requisites:
		* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)

Updated example #3

Updated example #3

Conversation

abarciauskas-bgse commented Feb 15, 2021

abarciauskas-bgse commented Feb 15, 2021

ciaransweet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabernat commented Feb 15, 2021

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Feb 15, 2021

abarciauskas-bgse commented Feb 15, 2021

abarciauskas-bgse commented Feb 16, 2021

TomAugspurger commented Feb 16, 2021 • edited Loading

abarciauskas-bgse commented Oct 25, 2022

TomAugspurger commented Feb 16, 2021 •

edited

Loading