Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring modern RESTful services for gridded data #11

Closed
jonmjoyce opened this issue Mar 29, 2022 · 9 comments
Closed

Exploring modern RESTful services for gridded data #11

jonmjoyce opened this issue Mar 29, 2022 · 9 comments
Assignees
Labels
2022 code sprint topic Proposed topic for a code sprint activity

Comments

@jonmjoyce
Copy link
Member

Project Description:

This project will explore passing gridded geospatial data through microservices. We would like to develop some standard HTTP interfaces for data requests and responses. The guiding principles for this work should focus on performance, accessibility, and interoperability.

We should at a minimum explore the following use cases:

  1. Requesting many tiles of data, as one would against a WMTS
  2. Requesting a time-series of data for a particular location
  3. Requesting an entire grid of data

For the purposes of this project, we can assume the data sources will be in cloud-optimized formats such as Zarr, and we can use Xarray to efficiently access the data.
One possibility is that we extend the Xpublish library to support more data requests. ”Xpublish lets you easily publish Xarray Datasets via a REST API.”
We can evaluate OpenDAP as a possible protocol as long as it is performant and easy for clients to use.
Apache Arrow is another interesting library to explore, but its support for 2D data is limited so far. It limits the need for serialization and deserialization by manipulating data in-memory rather than passing it throughout the system.

Expected Outcomes:

Ideally this project will develop an API framework that can continue to be extended. This API should be easy to use and accessible from any standard programming language and not require managed dependencies.

Although having a fully-functional API is far-fetched from just this code sprint, we will hopefully discuss, identify, and resolve the sticking points around access to scientific gridded data.

Skills required:

Python, understanding of gridded data, REST APIs, willingness to think outside the box

Difficulty:

Moderate-Difficult

Topic Lead(s):

Jonathan Joyce (jonmjoyce)

Relevant links:

https://docs.xarray.dev/en/stable/
https://xpublish.readthedocs.io/en/latest/
https://arrow.apache.org/

@jonmjoyce jonmjoyce added the code sprint topic Proposed topic for a code sprint activity label Mar 29, 2022
@MathewBiddle
Copy link
Contributor

MathewBiddle commented Mar 31, 2022

@jmunroe
Copy link

jmunroe commented Apr 4, 2022

Are you familiar with OGC API - Coverages and Tiles ? I think the goal the OGC API is, in part, to be a modern RESTful API could supplant WMS/WMTS/WFS/WCS. Pygeoapi is a reference server implementation written in Python.

I'll participate in this track.

@jonmjoyce
Copy link
Member Author

I am not especially familiar with the OGC API but it looks like a good candidate! I'd like to explore that further with an xarray/zarr backend and assess the performance. I have started doing some work with STAC and I know there's talk of integrating with OGC.

Thanks for the input! One reason I proposed this topic was to get more input on standards worth following.

@abkfenris
Copy link

Maybe take a look at OGC-EDR, especially for requesting a time series at a single location. PyGeoAPI has an implementation.

We've also been playing with STAC to provide a unified API to point towards both our EDR & THREDDS WMS endpoints in addition to links to ERDDAP and other info. I've been pondering building a Javascript library to make interacting with STAC endpoints easier taking a few ideas from PySTAC.

@MathewBiddle
Copy link
Contributor

@jonmjoyce are you okay giving this sprint pitch on Monday? We are looking for a brief overview of what the topic will be, what slack channel, and what breakout room to join the effort (along with anything else that might be relevant).

@jonmjoyce
Copy link
Member Author

@MathewBiddle Yes I can give an overview. I'm not sure about the slack channel and breakout room details though.

@MathewBiddle
Copy link
Contributor

Awesome! I have you in the Cloud/IoT breakout room. I haven't assigned a slack channel because I was unsure where this would fit. Does #cloud work for you? Otherwise we can create a new channel for this topic, just let me know what to call it.

@jonmjoyce
Copy link
Member Author

#cloud is fine for now. I doubt we'll do much cloud-specific work but this should help us move in that direction.

BTW, I set up a public Github repo for this work here: https://github.com/asascience/restful-grids

@mpiannucci
Copy link

Looking forward to working on this with everyone! Especially looking to explore apis that help improve access for web apps and visualizations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2022 code sprint topic Proposed topic for a code sprint activity
Projects
None yet
Development

No branches or pull requests

5 participants