Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A read-only TileDB backend #4987

Closed
jp-dark opened this issue Mar 2, 2021 · 14 comments
Closed

A read-only TileDB backend #4987

jp-dark opened this issue Mar 2, 2021 · 14 comments

Comments

@jp-dark
Copy link
Contributor

jp-dark commented Mar 2, 2021

This is a feature request for a read-only TileDB backend for reading a dense TileDB array into an xarray Dataset.

@jhamman
Copy link
Member

jhamman commented Mar 3, 2021

Hi @jp-dark! Thanks for opening this issue and the draft pull request in #4988. As you probably know, we're in the process of completing a major refactor of our storage backends system (see #4989 and #4810 for the current state of that work). One of the main feature additions in this work is the new entrypoints functionality which will allow backends (like TileDB) to declare backend support without including the code in the Xarray itself.

In light of this new functionality, we'd like to see if we can put the TileDB backend in TileDB itself (or in an another adjacent package). The end user functionality would be the same as the entrypoint would be registered at install time. We'd be happy to document the TileDB in the Xarray documentation as well.

This is the development pattern we are headed to with most of our backends, including some of the current backends. We'd be happy to help work with you to sort out the details as I'm sure there will be one or two early adopter bumps to work through.

cc @alexamici @aurghs @shoyer

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 3, 2021

@jhamman - thanks for the quick response! As I work through bumps would it be best to comment here or on one of the other currently open issues for the backend refactor?

@jhamman
Copy link
Member

jhamman commented Mar 3, 2021

This is a great spot @jp-dark! Looking forward to seeing your progress.

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 3, 2021

As a provider of a third-party backend, I would love to be able to integrate with xarray without including xarray as a dependency in my library, and xarray is actually really close to a place where that would be possible.

The main development effort on the xarray side would be creating a generic LazyLoadingBackendStore that accepts duck-typed backends (xarray can define them using Protocols for mypy goodness).

If there is interest in pursuing this, I can help with developing a prototype to test feasibility.

@shoyer
Copy link
Member

shoyer commented Mar 3, 2021

@jp-dark How would you feel about writing another small library, e.g., "xarray-tiledb" that can explicitly depend on both xarray and tiledb?

We can potentially do some of xarray's backend stuff with protocols, but there are some aspects (especially for more advanced features like lazy loading) that will likely need the hard xarray dependency.

@alexamici
Copy link
Collaborator

alexamici commented Mar 4, 2021

@jp-dark it is in fact possible to write an xarray backend without explicitly depending on xarray in your setup.py if you put all your backend glue code in a separate module not imported by the main __init__.py.

We use the setuptools entrypoints infrastructure that triggers a module load only from within xarray itself.

This is still work in progress, but we are implementing this strategy in cfgrib with success. You can get inspiration from the following PR by @aurghs:

ecmwf/cfgrib#203

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 4, 2021

@alexamici @shoyer To be clear, my short term plan is absolutely to move the TileDB backend from my draft pull request here to a small plugin in library (thanks for linking the cfgrib @alexamici!).

I only bring up the protocol thing because the backend is really close to a place where a lot of the boilerplate for the lazy loading, etc. could be provided on the xarray side with a simple API requirement on the third-party library side. I'll push up a small proof-of-concept with a read-only netCDF4 example shortly.

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 4, 2021

See PR #4998 for the example

edit: This example should be able to read a NetCDF dataset using the protocol engine:

xr.open_dataset("example.nc", engine="protocol")

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 5, 2021

Is there a branch of xarray that currently supports loading backend engines from third-party libraries?

@jhamman
Copy link
Member

jhamman commented Mar 5, 2021

#4989 includes the full refactor. The plan is to merge this to xarray/master on Monday.

@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 9, 2021

I was able to use this backend from an external code base with the entry point procedure as described in the new docs, and it was completely painless. Great job with the backend refactor!

@jp-dark jp-dark closed this as completed Mar 9, 2021
@jp-dark
Copy link
Contributor Author

jp-dark commented Mar 11, 2021

@jhamman Is there a planned date for releasing the backend updates to PyPI?

@jhamman
Copy link
Member

jhamman commented Mar 11, 2021

Not at this point. We just released 0.17 so I would think we're at least a week or two away from 0.18.

@max-sixty
Copy link
Collaborator

Not at this point. We just released 0.17 so I would think we're at least a week or two away from 0.18.

Just to set expectations — I hadn't thought we were releasing so soon — though I'd be happy to, and it's getting easier & easier to do releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants