-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NASA SMAP SSS recipe #31
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool (I am not really able to comment on the pangeo-forge specific stuff though)! Had some small nits for the description. Very excited for this data!
recipes/nasa-smap-sss/meta.yaml
Outdated
@@ -0,0 +1,21 @@ | |||
title: "NASA SMAP Sea Surface Salinity (SSS)" | |||
description: "Analysis-ready Zarr datasets derived from NASA SMAP Sea Surface Salinity (SSS) NetCDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be good to add that these are L3 (Level 3 gridded) data (there are also more complicated swath
products). I don't think we need the NetCDF in the description?
Yes to all of the above.
We are sort of moving gradually from collecting hypothetical use cases to actual recipes. I would update this label to be "proposed recipe"
Yes, it's fine. The current CI workflow (#28) will search for Going forward, I think we want to make the repo as simple, bare-bones, and self explanatory as possible. Feel free to propose changes in this direction.
👍 |
Why don't we open a new issue to track the improvements needed to the contributor workflow? |
The recipe-dict in As I move now into the (manual, notebook-based) execution phase, I will echo that the feature(s) discussed in pangeo-forge/pangeo-forge-recipes#97 and pangeo-forge/pangeo-forge-recipes#136 would presumably be useful even in manual execution settings. My workaround was to estimate the source sizes as follows: import numpy as np
import xarray as xr
for store in list(urls): # `urls` is a dictionary mapping of 'store_name' : source_url
ds = xr.open_dataset(urls[store][10]) # an arbitrary source file from each dataset
gbs = ds.nbytes/1e9
total_gbs = len(urls[store]) * gbs
print(f"{store} contains approx. {np.trunc(total_gbs)} GBs.") which returns:
Based on this information, I decided to start by trying to execute the (considerably smaller) monthly recipes only, using as reference the notebook Ryan has used manually execute an eNATL60 recipe (see #24 (comment)). The notebook is not currently linkable in full as it contains secrets. On the execution cell for recipe_key, r in recipes.items():
if 'monthly' in recipe_key:
try:
r.open_target()
print(f"found {recipe_key}")
except:
print(f"RUNNING {recipe_key}")
pl = r.to_pipelines()
plan = executor.pipelines_to_plan(pl)
executor.execute_plan(plan)
else:
pass I encountered the following errors:
I do not expect these issues will be diagnosable without the full notebook context, but I'm logging this in outline form here as a touchpoint nonetheless. Ryan and I will be discussing synchronously on Monday, after which I will follow up on this thread with any generalizable takeaways. |
Charles, yesterday we boiled this error down to a specific issue with fsspec. Would you mind sharing that code snippet here? |
Yes, the error was being thrown by line 40 in As suggested in fsspec/filesystem_spec#160 (comment), I was able to resolve this error by setting @martindurant, my lingering questions are:
from contextlib import contextmanager
from typing import Any, Iterator
import fsspec
# fsspec doesn't provide type hints, so I'm not sure what the write type is for open files
OpenFileType = Any
@contextmanager
def _fsspec_safe_open(fname: str, **kwargs) -> Iterator[OpenFileType]:
# workaround for inconsistent behavior of fsspec.open
# https://github.com/intake/filesystem_spec/issues/579
with fsspec.open(fname, **kwargs) as fp:
with fp as fp2:
yield fp2
base = 'https://podaac-opendap.jpl.nasa.gov/opendap/allData/'
fname = base + 'smap/L3/JPL/V5.0/8day_running/2015/120/SMAP_L3_SSS_20150504_8DAYS_V5.0.nc'
# open_kwargs = {'block_size': 0}
input_opener = _fsspec_safe_open(fname, mode="rb") #, **open_kwargs)
BLOCK_SIZE=10_000_000
with input_opener as source:
data = source.read(BLOCK_SIZE) Traceback (click to expand)
|
Noting that the PR referenced in the last commit is actually pangeo-forge/roadmap#22, not the one linked in the commit message. |
@sharkinsspatial, this is ready to be test-run through the bakery. I've already manually executed the Will there soon be a slash command that allows us to do a "test-bake" on the pruned subsets? (Apologies if the timeline on this was obvious from our other threads, still wrapping my head around all the layers here.) cc @jbusecke, getting close! |
This is saying "I want to view the whole file as a block" and will work fine. Really, the code is doing
Yes, probably. It is marginally possible (but not likely) that the server is not respecting the content encoding. The response header would have more information.
I'm afraid not. The HTTP response to HEAD or GET (before starting to download) might have useful markers, but this already depends on the server being well-behaved. Essentially, none of the header info keys are strictly required.
There have certainly been ongoing conversations around this kind of thing, and the range of circumstances that fsspec can handle has steadily grown. |
The OSN bucket is public for read only access. You can access it over s3 protocol with |
Then let's try to explicitly catch this error in Pangeo forge and raise a detailed error message with the suggested workaround. |
/run-recipe-test |
@cisaacstern Can you include a |
@cisaacstern As a note. In the short interim while we wait for a release of |
/run-recipe-test |
/run-recipe-test |
2 similar comments
/run-recipe-test |
/run-recipe-test |
@jbusecke, the first two timesteps of each of the four datasets (two time intervals for each of two algorithms) are available on OSN as follows: import s3fs
endpoint_url = 'https://ncsa.osn.xsede.org'
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
fs_osn.ls("Pangeo/pangeo-forge/NASA-SMAP-SSS/JPL")
fs_osn.ls("Pangeo/pangeo-forge/NASA-SMAP-SSS/RSS")
@sharkinsspatial, were the complete time series ever built by the bakery, and if so are they publicly accessible somewhere? |
Could we try re-running this recipe in our latest infrastructure? |
Yes I'll change the bakery in |
It looks like your 4 validation errors for MetaYaml
recipes -> 0 -> id
field required (type=value_error.missing)
recipes -> 0 -> object
field required (type=value_error.missing)
recipes
value is not a valid dict (type=type_error.dict)
maintainers -> 0 -> orcid
field required (type=value_error.missing)
Please correct your |
The bot doesn't understand |
A-ha. So it remains true that the bot does not understand recipes:
- dict_object: "recipe:recipes" when it should be a simple mapping (rather than a list), i.e. recipes:
dict_object: "recipe:recipes" I've noted this issue in pangeo-forge/roadmap#49 |
3eaee88
to
906c7f4
Compare
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Thanks for working on this @andersy005. Let me know if you need any input from my side! |
recipes = { | ||
list(patterns)[i]: ( | ||
XarrayZarrRecipe( | ||
patterns[list(patterns)[i]], | ||
) | ||
) | ||
for i in range(4) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbusecke, i could use your help here. i'm not very familiar with recipes that define multiple recipes within a dict, and for reason i don't understand yet, the backend seems to think this recipe contains errors that result in a infinite loop when synchronize this recipe with the database. unfortunately, the error message returned is a opaque and
can you take a look at this, and let me know if anything stands out? is this dict well defined? thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm planning to post the error message here later today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbusecke, here's the traceback, which seems to hint at a missing key from the recipes dict. is the multi recipes approach within the same feedstock documented somewhere? I couldn't find anything in the documentation.
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827794&selected=1522411785187827794) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827794&selected=1522411785187827794) Traceback (most recent call last):
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827795&selected=1522411785187827795) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827795&selected=1522411785187827795) File "/usr/local/lib/python3.9/dist-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827796&selected=1522411785187827796) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827796&selected=1522411785187827796) result = await app(self.scope, self.receive, self.send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827797&selected=1522411785187827797) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827797&selected=1522411785187827797) File "/usr/local/lib/python3.9/dist-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827798&selected=1522411785187827798) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827798&selected=1522411785187827798) return await self.app(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827800&selected=1522411785187827800) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827800&selected=1522411785187827800) File "/usr/local/lib/python3.9/dist-packages/fastapi/applications.py", line 208, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827801&selected=1522411785187827801) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827801&selected=1522411785187827801) await super().__call__(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827802&selected=1522411785187827802) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827802&selected=1522411785187827802) File "/usr/local/lib/python3.9/dist-packages/starlette/applications.py", line 112, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827804&selected=1522411785187827804) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827804&selected=1522411785187827804) await self.middleware_stack(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827805&selected=1522411785187827805) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827805&selected=1522411785187827805) File "/usr/local/lib/python3.9/dist-packages/starlette/middleware/errors.py", line 181, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827808&selected=1522411785187827808) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827808&selected=1522411785187827808) raise exc
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827810&selected=1522411785187827810) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827810&selected=1522411785187827810) File "/usr/local/lib/python3.9/dist-packages/starlette/middleware/errors.py", line 159, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827811&selected=1522411785187827811) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827811&selected=1522411785187827811) await self.app(scope, receive, _send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827812&selected=1522411785187827812) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827812&selected=1522411785187827812) File "/usr/local/lib/python3.9/dist-packages/starlette/middleware/cors.py", line 84, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827813&selected=1522411785187827813) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827813&selected=1522411785187827813) await self.app(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827814&selected=1522411785187827814) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827814&selected=1522411785187827814) File "/usr/local/lib/python3.9/dist-packages/starlette/exceptions.py", line 82, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827815&selected=1522411785187827815) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827815&selected=1522411785187827815) raise exc
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827816&selected=1522411785187827816) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827816&selected=1522411785187827816) File "/usr/local/lib/python3.9/dist-packages/starlette/exceptions.py", line 71, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827817&selected=1522411785187827817) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827817&selected=1522411785187827817) await self.app(scope, receive, sender)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827818&selected=1522411785187827818) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827818&selected=1522411785187827818) File "/usr/local/lib/python3.9/dist-packages/starlette/routing.py", line 656, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827819&selected=1522411785187827819) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827819&selected=1522411785187827819) await route.handle(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827820&selected=1522411785187827820) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827820&selected=1522411785187827820) File "/usr/local/lib/python3.9/dist-packages/starlette/routing.py", line 259, in handle
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827821&selected=1522411785187827821) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827821&selected=1522411785187827821) await self.app(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827822&selected=1522411785187827822) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827822&selected=1522411785187827822) File "/usr/local/lib/python3.9/dist-packages/starlette/routing.py", line 64, in app
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827823&selected=1522411785187827823) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827823&selected=1522411785187827823) await response(scope, receive, send)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827825&selected=1522411785187827825) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827825&selected=1522411785187827825) File "/usr/local/lib/python3.9/dist-packages/starlette/responses.py", line 159, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827826&selected=1522411785187827826) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827826&selected=1522411785187827826) await self.background()
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827828&selected=1522411785187827828) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827828&selected=1522411785187827828) File "/usr/local/lib/python3.9/dist-packages/starlette/background.py", line 35, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827830&selected=1522411785187827830) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827830&selected=1522411785187827830) await task()
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827832&selected=1522411785187827832) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827832&selected=1522411785187827832) File "/usr/local/lib/python3.9/dist-packages/starlette/background.py", line 18, in __call__
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827833&selected=1522411785187827833) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827833&selected=1522411785187827833) await self.func(*self.args, **self.kwargs)
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827834&selected=1522411785187827834) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827834&selected=1522411785187827834) File "/opt/app/pangeo_forge_orchestrator/routers/github_app.py", line 783, in synchronize
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827835&selected=1522411785187827835) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827835&selected=1522411785187827835) new_models = [
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827836&selected=1522411785187827836) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827836&selected=1522411785187827836) File "/opt/app/pangeo_forge_orchestrator/routers/github_app.py", line 785, in <listcomp>
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827837&selected=1522411785187827837) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827837&selected=1522411785187827837) recipe_id=recipe["id"],
Oct 27 15:07:57 [pangeo-forge-api-prod](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?focus=1522411785187827838&selected=1522411785187827838) [app/web.1](https://my.papertrailapp.com/events?q=program%3Aapp%2Fweb.1&focus=1522411785187827838&selected=1522411785187827838) KeyError: 'id'
Co-authored-by: Julius Busecke <julius@ldeo.columbia.edu>
/run NASA-SMAP-SSS/RSS/monthly |
🎉 The test run of import xarray as xr
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1366/NASA-SMAP-SSS/RSS/monthly.zarr"
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds |
/run NASA-SMAP-SSS/JPL/8day |
Draft PR which will close #30 when complete.
@rabernat, @jbusecke, and @hscannell: submitting this (very) rough first pass as a point for conversation around some structural questions (and a suggestion) that I've encountered so far. Interested in feedback regarding any of the below.
The word "pipeline" appears in a lot of places, including the README for the
staged-recipes
repo, and in the title of this issue (Example pipeline for SMAP Seasurface Salinity #30).Prefect
layer, but rather contributingrecipe.py
s andmeta.yaml
s only? If so, should we open an issue to re-write the README and associated docs?Why do we have so many things labeled
example
in issues? What's the difference between anexample
and just, a recipe staged by a maintainer?recipes/
rather than withinrecipes/examples/
.Suggestion: I've opted to
pip install jupytext
(https://jupytext.readthedocs.io/en/latest/index.html) into mystaged-recipe
development environment, so that I can execute myrecipe.py
text file line-by-line in Jupyter during development. (Without this dependency, in order to debug the recipe in Jupyter, I would've had to create a separaterecipe-dev.ipynb
file for development, and then copy-and-paste the relevant bits into a.py
file for the PR.) What do we think about incorporating this dependency as part of the recommended contribution/development workflow?