Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray reader #530

Merged
merged 23 commits into from
Oct 18, 2022
Merged

Xarray reader #530

merged 23 commits into from
Oct 18, 2022

Conversation

vincentsarago
Copy link
Member

@vincentsarago vincentsarago changed the base branch from master to rio-tiler-v4 September 28, 2022 19:06
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

rio_tiler/io/xarray.py Outdated Show resolved Hide resolved
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
@rabernat
Copy link
Contributor

I recommend using this dataset: https://pangeo-forge.org/dashboard/feedstock/3

import xarray as xr

store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr'
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds

@geospatial-jeff
Copy link
Member

geospatial-jeff commented Sep 29, 2022

This seems to work:

import mercantile
import rioxarray
import xarray as xr
import numpy as np


def clip_to_tile(tile: mercantile.Tile, ds: xr.DataArray):
    bounds = mercantile.xy_bounds(tile)
    wgs84_bounds = mercantile.bounds(tile)
    
    # Create source array by clipping the xarray dataset to extent of the tile.
    source_arr = ds.rio.clip_box(*wgs84_bounds)
    
    # Create the target array.
    x = np.linspace(bounds[0], bounds[2], 256)
    y = np.linspace(bounds[1], bounds[3], 256)
    target_arr = xr.DataArray(
        np.zeros((ds['time'].shape[0], 256, 256), ds.dtype),
        [
            ("time", ds['time'].data),
            ("y", y),
            ("x", x)
        ]
    )
    target_arr.rio.write_crs(3857, inplace=True)
    
    # Map source to target
    matched = source_arr.rio.reproject_match(target_arr)
    return matched
# Create a web mercator tile.
x = 0
y = 0
z = 10
tile = mercantile.tile(x, y, z)

# Open xarray dataset.
store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr'
ds = xr.open_dataset(store, engine='zarr', chunks={})
data_array = ds['analysed_sst']

# Add CRS so rioxarray doens't complain.
data_array.rio.write_crs(4326, inplace=True)

# Only fetch one time slice.
data_array = data_array[:1]

arr = clip_to_tile(tile, data_array)

image

@rabernat
Copy link
Contributor

Here's how you would make a cached zarr store

import zarr
url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr'
store = zarr.storage.FSStore(url)
cache = zarr.LRUStoreCache(store, max_size=2**28)
ds = xr.open_dataset(cache, engine='zarr')

@vincentsarago
Copy link
Member Author

🥳

import rioxarray 
import xarray
from matplotlib.pyplot import imshow
from rio_tiler.io import XarrayReader

with xarray.open_dataset(
    "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr",
    engine="zarr",
    decode_coords="all"
) as src:
    with XarrayReader(None, dataset=src["analysed_sst"][:1], crs="epsg:4326") as dst:
        img = dst.tile(1, 1, 2)
        imshow(img.data_as_image())

Screen Shot 2022-09-29 at 1 55 13 PM

@vincentsarago
Copy link
Member Author

with the latest commit

with xarray.open_dataset(
    "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr",
    engine="zarr",
    decode_coords="all"
) as src:
    ds = src["analysed_sst"][:1]
    ds.rio.write_crs("epsg:4326", inplace=True)
    with XarrayReader(ds) as dst:
        img = dst.tile(1, 1, 2)

@TomAugspurger
Copy link
Contributor

An example using Daymet on Azure:

import rioxarray 
import xarray
from matplotlib.pyplot import imshow
from rio_tiler.io import XarrayReader
import requests

credential = requests.get("https://planetarycomputer.microsoft.com/api/sas/v1/token/daymet-daily-pr").json()["token"]

url = "az://daymet-zarr/annual/pr.zarr"
storage_options = {
    "account_name": "daymeteuwest",
    "credential": credential
}
src = xarray.open_dataset(url, engine="zarr", decode_coords="all", storage_options=storage_options)

with XarrayReader(src["prcp"][:1]) as dst:
    img = dst.tile(1, 1, 2)
    imshow(img.data_as_image())

image

With decode_coords="all", rioxarray is able to correctly construct a CRS (I think using CF-conventions and the grid_mapping_name attribute).

@vincentsarago
Copy link
Member Author

with latest commit I've removed XarrayReader from rio_tiler.io.__init__ so you need to do from rio_tiler.io.xarray import XarrayReader

@vincentsarago
Copy link
Member Author

here is a simple tile server based on XarrayReader

import urllib
from enum import Enum
from typing import List, Optional, Tuple

import xarray
from fastapi import FastAPI, Query
from fastapi.responses import Response
from pydantic import BaseModel, Field, root_validator
from rio_tiler.io.xarray import XarrayReader
from starlette.requests import Request


class SchemeEnum(str, Enum):
    """TileJSON scheme choice."""

    xyz = "xyz"
    tms = "tms"


class TileJSON(BaseModel):
    """
    TileJSON model.

    Based on https://github.com/mapbox/tilejson-spec/tree/master/2.2.0

    """

    tilejson: str = "2.2.0"
    name: Optional[str]
    description: Optional[str]
    version: str = "1.0.0"
    attribution: Optional[str]
    template: Optional[str]
    legend: Optional[str]
    scheme: SchemeEnum = SchemeEnum.xyz
    tiles: List[str]
    grids: Optional[List[str]]
    data: Optional[List[str]]
    minzoom: int = Field(0, ge=0, le=30)
    maxzoom: int = Field(30, ge=0, le=30)
    bounds: List[float] = [-180, -90, 180, 90]
    center: Optional[Tuple[float, float, int]]

    @root_validator
    def compute_center(cls, values):
        """Compute center if it does not exist."""
        bounds = values["bounds"]
        if not values.get("center"):
            values["center"] = (
                (bounds[0] + bounds[2]) / 2,
                (bounds[1] + bounds[3]) / 2,
                values["minzoom"],
            )
        return values

    class Config:
        """TileJSON model configuration."""

        use_enum_values = True


app = FastAPI()


@app.get("/tiles/{z}/{x}/{y}", response_class=Response)
def tile(
    z: int,
    x: int,
    y: int,
    url: str = Query(description="Zarr URL"),
    variable: str = Query(description="Zarr Variable"),
):
    with xarray.open_dataset(url, engine="zarr", decode_coords="all") as src:
        ds = src[variable][:1]

        # Make sure we are a CRS
        crs = ds.rio.crs or "epsg:4326"
        ds.rio.write_crs(crs, inplace=True)

        with XarrayReader(ds) as dst:
            img = dst.tile(x, y, z)

        content = img.render(img_format="PNG")

    return Response(content, media_type="image/png")


@app.get(
    "/tilejson.json",
    response_model=TileJSON,
    responses={200: {"description": "Return a tilejson"}},
    response_model_exclude_none=True,
    tags=["API"],
)
def tilejson(
    request: Request,
    url: str = Query(description="Zarr URL"),
    variable: str = Query(description="Zarr Variable"),
):
    """Handle /tilejson.json requests."""
    kwargs: Dict[str, Any] = {"z": "{z}", "x": "{x}", "y": "{y}"}
    tile_url = request.url_for("tile", **kwargs)

    qs = [
        (key, value)
        for (key, value) in request.query_params._list
        if key not in ["tile_format"]
    ]
    if qs:
        tile_url += f"?{urllib.parse.urlencode(qs)}"

    with xarray.open_dataset(url, engine="zarr", decode_coords="all") as src:
        ds = src[variable][:1]

        # Make sure we are a CRS
        crs = ds.rio.crs or "epsg:4326"
        ds.rio.write_crs(crs, inplace=True)

        with XarrayReader(ds) as dst:
            return dict(
                bounds=dst.geographic_bounds,
                minzoom=dst.minzoom,
                maxzoom=dst.maxzoom,
                name="xarray",
                tilejson="2.1.0",
                tiles=[tile_url],
            )
curl http://127.0.0.1:8081/tilejson.json?url=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr&variable=analysed_sst | jq

{
  "tilejson": "2.1.0",
  "name": "xarray",
  "version": "1.0.0",
  "scheme": "xyz",
  "tiles": [
    "[http://127.0.0.1:8081/tiles/{z}/{x}/{y}?url=https%3A%2F%2Fncsa.osn.xsede.org%2FPangeo%2Fpangeo-forge%2Fnoaa-coastwatch-geopolar-sst-feedstock%2Fnoaa-coastwatch-geopolar-sst.zarr&variable=analysed_sst](http://127.0.0.1:8081/tiles/%7Bz%7D/%7Bx%7D/%7By%7D?url=https%3A%2F%2Fncsa.osn.xsede.org%2FPangeo%2Fpangeo-forge%2Fnoaa-coastwatch-geopolar-sst-feedstock%2Fnoaa-coastwatch-geopolar-sst.zarr&variable=analysed_sst)"
  ],
  "minzoom": 0,
  "maxzoom": 2,
  "bounds": [
    -180.00000610436345,
    -89.99999847369712,
    180.00000610436345,
    89.99999847369712
  ],
  "center": [
    0,
    0,
    0
  ]
}

@alexgleith
Copy link

This is fantastic, I'm going to try to find some time to test it.

🚀

**kwargs: Any,
) -> Dict[str, BandStatistics]:
"""Return bands statistics from a dataset."""
raise NotImplementedError
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to have statistics and preview implemented because it will be really inefficient for now

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for working on this @vincentsarago!

rio_tiler/io/xarray.py Outdated Show resolved Hide resolved
rio_tiler/io/xarray.py Outdated Show resolved Hide resolved
rio_tiler/io/xarray.py Outdated Show resolved Hide resolved
"""Xarray Reader.

Attributes:
dataset (rasterio.io.DatasetReader or rasterio.io.DatasetWriter or rasterio.vrt.WarpedVRT, optional): Rasterio dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be helpful (for me at least) to list out all the restrictions we have on this DataArray:

  1. Can be >2d, but requires that the last two dimensions represent longitude / latitude?
  2. Do we care what the names of the lat / lon / x / y dimensions are?
  3. Must have a CRS as discovered by .rio.crs
  4. ...

Any others we know of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care what the names of the lat / lon / x / y dimensions are?

reading https://github.com/corteva/rioxarray/blob/dcb26dd0af034dbffd1952bd61d8f62e16b0ee4d/rioxarray/rioxarray.py#L251-L276 it seems that only x,y,longitude,latitude are valid 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or CF coordinates with appropriate standard names.

Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>
@TomAugspurger
Copy link
Contributor

In case it's helpful for testing, I put a Zarr store up at "https://pangeo.blob.core.windows.net/pangeo-public/daymet-rio-tiler/na-wgs84.zarr/" that's been reprojected to WGS84 and chunked into 256x256 pixel tiles. There's a single time slice, and one variable tmax.

It's Azure's West Europe region.

import xarray
import matplotlib.pyplot as plt
from rio_tiler.io.xarray import XarrayReader

ds = xarray.open_dataset(
    "https://pangeo.blob.core.windows.net/pangeo-public/daymet-rio-tiler/na-wgs84.zarr/",
    engine="zarr",
    decode_coords="all",
    consolidated=True,
)
da = ds["tmax"]

with XarrayReader(da) as dst:
    img = dst.tile(1, 1, 2)

plt.imshow(img.data_as_image());

@vincentsarago vincentsarago requested review from rabernat, TomAugspurger and geospatial-jeff and removed request for rabernat and TomAugspurger October 17, 2022 15:02
@vincentsarago vincentsarago marked this pull request as ready for review October 17, 2022 20:16
Copy link
Member

@geospatial-jeff geospatial-jeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@vincentsarago vincentsarago merged commit f8e1071 into rio-tiler-v4 Oct 18, 2022
@vincentsarago vincentsarago deleted the XarrayReader branch October 18, 2022 15:46
@vincentsarago
Copy link
Member Author

🙏 thanks everyone

I'm merging this to the main PR and will work on updating the documentation before publishing a new version of rio-tiler with the new XarrayReader.

vincentsarago added a commit that referenced this pull request Oct 19, 2022
* update rasterio requirement (#517)

* update rasterio requirement

* remove python 3.7

* update changelog

* reader functions returns ImageData object (#515)

* reader functions returns ImageData object

* update changelog

* remove async base reader (#520)

* remove gcp reader (#521)

Co-authored-by: vincentsarago <vincent.sarago@gmail.com>

* avoid warning in tests

* generalize buffer option (#519)

* generalize buffer option

* refactor buffer calculation

* better tests

* remove min/max zoom from baseclass attribute (#522)

* remove min/max zoom from baseclass attribute

* fix benchmark

* split zooms methods and add tests

* rename function

* move apply_expression into ImageData class and use `b{ix}` for band names (#523)

* move apply_expression into ImageData class and use `b{ix}` for band names

* fix test

* make sure we have the good number of bands

* Apply expression part2 (#525)

* ImageData.apply_expression return a new ImageData object

* refactor Points method and deprecate asset_expression

* remove unused option

* Update rio_tiler/models.py

* fix regex

* add PointData class (#526)

* add PointData class

* fix test

* remove deprecated

* fix test 2

* better expression parsing for stac

* docstring fixes

* only use WarpedVRT when doing reprojection or nodata overwride (#529)

* update baseclass to remove useless kwargs

* update changelog

* forward dataset statistics to ImageClass (#531)

* forward dataset statistics to ImageClass

* better type and autorescale

* one more test

* rename `COGReader` to `Reader` and remove relative import (#534)

* rename COGReader to Reader and remove relative import

* rasterio

* debug

* merge from branch

* add note for backwards compatibility of COGReader

* remove individual options attributes (#535)

* remove individual options attributes

* add typeDict for options

* allow empty options

* make sure to not change the input array

* add ImageReader for non-geo images (#536)

* add ImageReader for non-geo images

* update changelog

* Xarray reader (#530)

* sketch out Xarray support

* Update rio_tiler/io/xarray.py

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

* read tile

* assume crs is in the dataset

* upside down

* dependencies

* cleanup

* add more methods and cleanup

* Update rio_tiler/io/xarray.py

Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>

* enumerate from 1

* fix

* add dataset_statistics and notebook

* add XarrayReader in top level import

* add tests

* update notebook

* update changelog

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>

* add migration docs

* update readme

* update notebooks

* update docs

Co-authored-by: Jeff Albrecht <geospatialjeff@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>
@TomAugspurger
Copy link
Contributor

Thanks Vincent!

@vietnguyengit
Copy link

Hi @TomAugspurger , I had my Conda env setup (Python 3.9.13) with the latest rio-tiler but not sure why I'm unable to import it to the notebook while importing to Python CLI has no issues. Screenshots below:

Successfully imported:

image

Same Conda env but unable to import to the notebook:

image

Thanks Tom.

@kylebarron
Copy link
Member

This isn't in a released package yet. You can try 4.0.0a0 if you want

@vietnguyengit
Copy link

Perfect! Thanks @kylebarron

image

@vietnguyengit
Copy link

Hi @kylebarron @TomAugspurger again,

I wonder why the axes are always 0-250 (my Zarr store is SST ocean data around Australia's region)

image

and also for tile settings, I wonder if there is somewhere document on how to pick the numbers. I can only get it works with (1,1,1)

When I ran my Zarr store with the XarrayReader fastapi server, the json returned looks like this :

{
"tilejson": "2.1.0",
"name": "xarray",
"version": "1.0.0",
"scheme": "xyz",
"tiles": [
"[http://127.0.0.1:8000/tiles/{z}/{x}/{y}?url=s3%3A%2F%2Fimos-data-pixeldrill%2Fvhnguyen%2Fplayground%2Fmulti-years&variable=sea_surface_temperature](http://127.0.0.1:8000/tiles/%7Bz%7D/%7Bx%7D/%7By%7D?url=s3%3A%2F%2Fimos-data-pixeldrill%2Fvhnguyen%2Fplayground%2Fmulti-years&variable=sea_surface_temperature)"
],
"minzoom": 1,
"maxzoom": 5,
"bounds": [
70.00000213595068,
-69.99999786350668,
190.00000549344387,
19.999999770855315
],
"center": [
130.00000381469727,
-24.999999046325684,
1
]
}

Thank you.

@vincentsarago
Copy link
Member Author

@vietnguyengit is it possible that your X coordinates goes from 0 to 360?

@vietnguyengit
Copy link

Thanks @vincentsarago

Actually, I understood what those numbers are after reading the docs here: https://cogeotiff.github.io/rio-tiler/api/rio_tiler/io/xarray/

And the results so far are sweet! 🎉

Sample results:

with random XY coordinates around Australia's coastal regions

zoom level 2
image

zoom level 4
image

zoom level 5
image

Ping @alexgleith

@alexgleith
Copy link

Hi @vietnguyengit, nice work! I'll come find you today, let's see if we can get this showing on a webmap... maybe with a timeslider too!

@yoninachmany
Copy link

In case it's helpful for testing, I put a Zarr store up at "https://pangeo.blob.core.windows.net/pangeo-public/daymet-rio-tiler/na-wgs84.zarr/" that's been reprojected to WGS84 and chunked into 256x256 pixel tiles. There's a single time slice, and one variable tmax.

It's Azure's West Europe region.

import xarray
import matplotlib.pyplot as plt
from rio_tiler.io.xarray import XarrayReader

ds = xarray.open_dataset(
    "https://pangeo.blob.core.windows.net/pangeo-public/daymet-rio-tiler/na-wgs84.zarr/",
    engine="zarr",
    decode_coords="all",
    consolidated=True,
)
da = ds["tmax"]

The DataArray has nan values – can they be masked out in tile endpoints or does nodata handling need to be added to XarrayReader.tile?

da
<xarray.DataArray 'tmax' (time: 1, y: 3728, x: 17268)>
array([[[nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan],
        ...,
        [nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan]]], dtype=float32)

Thanks!

@vincentsarago
Copy link
Member Author

vincentsarago commented Apr 26, 2023

@yoninachmany we didn't implement mask in https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/io/xarray.py#L239
I'm not familiar enough with rio-xarray but I feel that it should be possible to create a mask and add it to the ImageData output

Would you be able to open an issue about this ?🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants