Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Conversation

abarciauskas-bgse
Copy link

No description provided.

@abarciauskas-bgse
Copy link
Author

@TomAugspurger @CiaranEvans Could you review this PR? I can't apparently assign reviewers in this repo.

Copy link

@ciaransweet ciaransweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly small change suggestions @abarciauskas-bgse

I wonder if we need to look at further abstracting some stuff in here behind some more pangeo-forge provided functionality, there's still a fair bit of knowledge needed for this kind of Python work..

README.md Outdated Show resolved Hide resolved
README.md Outdated
```bash
conda env create -f=envirnoment.yml
conda activate pangeo-pipeline
python recipes/pipeline.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python recipes/pipeline.py
python recipe/pipeline.py

README.md Outdated Show resolved Hide resolved
README.md Outdated
Configure S3 Access:

```bash
cp recipe/config.yml.example config.yml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see recipe/config.yml.example in this branch


## Github Workflows

NOTE: The scripts and workflows in `.github/` are out of date with the current example code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting on my pragmatic hat, if they're out of date, why don't we just remove them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that but I also thought github workflows are something that should be a part of this repo (if we decide to keep this repo)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay, but we can put them in if we need them, the benefit of this statement is minimal I'd argue (and the inclusion of stuff that's not being used).

We shouldn't be afraid of deleting stuff, that's what git's for in the end of the day 🥳

Comment on lines 1 to 13
import argparse
from fsspec.implementations.local import LocalFileSystem
import os
import pandas as pd
import pangeo_forge
import pangeo_forge.utils
from pangeo_forge.tasks.http import download
from pangeo_forge.tasks.xarray import combine_and_write
from pangeo_forge.tasks.zarr import consolidate_metadata
from prefect import Flow, Parameter, task, unmapped

# We use Prefect to manage pipelines. In this pipeline we'll see
# * Tasks: https://docs.prefect.io/core/concepts/tasks.html
# * Flows: https://docs.prefect.io/core/concepts/flows.html
# * Parameters: https://docs.prefect.io/core/concepts/parameters.html

# A Task is one step in your pipeline. The `source_url` takes a day
# like '2020-01-01' and returns the URL of the raw data.


@task
from pangeo_forge.recipe import NetCDFtoZarrSequentialRecipe
from pangeo_forge.storage import CacheFSSpecTarget, FSSpecTarget
from pangeo_forge.executors import PythonPipelineExecutor, PrefectPipelineExecutor
import prefect
from prefect import task, Flow
import shutil
import s3fs
import tempfile
import yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going with on linting/codestyles? AFAIK this isn't PEP8 compliant (which is being really picky, but I think we should just use something opinionated like flake8, black and isort)

PEP8 imports

Imports should be grouped in the following order:

Standard library imports.
Related third party imports.
Local application/library specific imports.

You should put a blank line between each group of imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recipe is linted when it goes through staged-recipes at https://github.com/pangeo-forge/staged-recipes/blob/0d6fa51a1189a7bdb1927e7987c7a74216d13ab2/.pre-commit-config.yaml.

I'd recommend copying that (or a similar) pre-commit config over when creating a new repository.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue linting then is a bit too late, I'd rather we upheld formatting across all our repos as they are, rather than transforming them when merged into others

It just makes everything look uniform then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: recipes go through staged-recipes first, and then the GitHub repository is created for that recipe. So the linting will happen there first.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine, but if we're using this as an example, we should probably still lint/format this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right:

I'd recommend copying that (or a similar) pre-commit config over when creating a new repository.

this_dir = os.path.dirname(os.path.abspath(__file__))

if args.storage == 's3':
# Read the config file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually if there's a comment like this saying <Verb> <something> I prefer to take out the following block into a function called that, but this is an open comment haha


if args.execution_env == 'prefect':
executor = PrefectPipelineExecutor()
print(executor)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this print needed?

executor = PrefectPipelineExecutor()
print(executor)
plan = executor.pipelines_to_plan(pipeline)
# The 'plan' is a prefect.Flow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's helpful to know that while the plan is a return from a pangeo_forge module, the return object is a prefect module (and so you can operate on it with prefect functions).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont most editors/IDEs give you those prompts though?


Pre-requisites:
* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they need prefect installed? I know that this section is for local running but this will require prefect if the user passes that argument

parser.add_argument('--storage', help='Optional argument to store data remotely, such as on AWS')
args = parser.parse_args()

# NOAA SST Specific Functions for generating a list of URLs
def source_url(day: str) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this to be refactored to be named get_source_url

recipe/pipeline.py Outdated Show resolved Hide resolved
@rabernat
Copy link

It's great to see an update to this! Thanks @abarciauskas-bgse! I really appreciate your efforts. 😄

IMO, it may be a tiny bit premature early to be updating the recipe. We need to decide what ingredients a recipe repo should contain (see pangeo-forge/pangeo-forge-recipes#71). In my [very possibly wrong] opinion, the recipe should just contain a bare minimum. It should not

  • assign storage targets (this will be done by the bakery)
  • contain execution code (this should be done by a github action we have yet to write)

My draft PR for a new recipe (see pangeo-forge/staged-recipes#20) tries to follow this bare-bones approach.

In any case, let's use this as a discussion case at our meeting today.

abarciauskas-bgse and others added 3 commits February 15, 2021 08:36
Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>
Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>
Co-authored-by: Ciaran Evans <9111975+ciaranevans@users.noreply.github.com>
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. A few comments.

import tempfile
import yaml

parser = argparse.ArgumentParser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this type of stuff will go in the pangeo-forge CLI. So something like pangeo-forge execute recipe --execution-env=... --storage=....

pangeo-forge/pangeo-forge-recipes#43 is the tracking issue for that I think.

)
this_dir = os.path.dirname(os.path.abspath(__file__))

if args.storage == 's3':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that we can rely on fsspec to completely abstract away the file system: https://pangeo-forge.readthedocs.io/en/latest/recipes.html?highlight=fsspec#storage.

So this could be something like

# CLI
$ pangeo-forge execute recipe --storage=file:///my-path
$ pangeo-forge execute recipe --storage=s3://my-bucket/my-path

And then in the recipe we do

fs = fsspec.get_filesystem_class(args.storage)()

README.md Outdated
## Run the example on Prefect Cloud

Pre-requisites:
* Create an account on cloud.prefect.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if a prefect cloud account is required for executing this locally with the prefect executor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so

@TomAugspurger
Copy link
Contributor

Another open question: do we want to modify this repo in place? Or should we delete it and regenerate it with the tooling to generate repositories when merged into staged-recipes? (If we do delete it, I think we should continue reviewing here to figure out what we want the output repository look like)

@abarciauskas-bgse abarciauskas-bgse marked this pull request as draft February 15, 2021 16:55
@abarciauskas-bgse
Copy link
Author

Thanks all for reviewing, changes were very helpful.

I'm happy to discuss our approach to examples during the meeting today.

@abarciauskas-bgse
Copy link
Author

@rabernat @TomAugspurger given the discussion yesterday and pangeo-forge/staged-recipes#20, do we still want an example pipeline or example recipe repository?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 16, 2021

IMO, the order should be:

  1. Make Draft of noaa-oisst recipe staged-recipes#20 runnable with the new base pangeo_forge classes & bakery infrastructure.
  2. Update https://github.com/pangeo-forge/staged-recipes/tree/master/recipes/example to use the new base classes.
  3. Delete and regenerate this repository based on https://github.com/pangeo-forge/staged-recipes/blob/master/.github/workflows/create-repository.yaml.

I think that gets us closes to and end-to-end workflow from new recipe to something running on the bakeries.

So, I'd say hold off on further effort here for now.

@abarciauskas-bgse
Copy link
Author

closing as stale

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants