How to implement custom backend without explicit variable decoding #9942

pyvelepor · 2025-01-12T12:16:31Z

pyvelepor
Jan 12, 2025

I'm having trouble figuring out what steps I'm missing to implement a custom backend to return a dataset which displays all non-indexed variables as expected.

I already have methods to collect and transform the data into a xarray Dataset object, and subclassed BackendEntrypoint and implemented open_dataset to use those methods to return a dataset. I'm currently not decoding variables -- didn't think my use case required it -- or making use of a data store like some of the other modules in (https://github.com/pydata/xarray/tree/main/xarray/backends). The docs also read as though variable decoding is optional.

The challenge is I can't figure out what steps I'm missing which result in xr.open_dataset returning a dataset which displays variables differently from what I'd expect. The resulting dataset is functional, but text and HTML display for non-indexed variables differ slightly.

The examples below show differences between a dataset manually created, and the same dataset returned by a custom backend. The dataset includes:

dimension a, with coordinates
non-dimension b, with coordinates
two variables, x and y, aligned with dimension a

Example 1: Simple dataset

import xarray as xr

xr.Dataset(
    data_vars={
        "x": xr.DataArray(
            data=[1,1,1,1],
            dims="a"
        ),
        "y": xr.DataArray(
            data=[2,2,2,2],
            dims="a"
        )
    },
    coords={
        "a": [1,2,3,4],
        "b": ("a", [1,2,3,4])
    }
)

Output in JupyterLab

If the same dataset is returned via open_dataset though, the results are slightly different.

Example2: "Opening" dataset using custom backend.
This is similar to the backend I've currently implemented if you were to replace the fixed dataset with logic that collects the data from the file, constructs the dataset, and returns it. Like I mentioned, no decoding or use of data stores is happening within open_dataset.

import numpy as np
import xarray as xr
from xarray.backends import BackendEntrypoint

# Simple backend that returns a fixed dataset with two variables, one dimension, and one non-dimension
class SimpleBackend(BackendEntrypoint):
    description = "Simple backend"

    def open_dataset(
        self,
        filename_or_obj,
        *,
        drop_variables=None,
    ):

        return xr.Dataset(
            data_vars={
                "x": xr.DataArray(
                    data=[1,1,1,1],
                    dims="a"
                ),
                "y": xr.DataArray(
                    data=[2,2,2,2],
                    dims="a"
                )
            },
            coords={
                "a": [1,2,3,4],
                "b": ("a", [1,2,3,4])
            }
        )


xr.open_dataset("test", engine=SimpleBackend)

Output in JupyterLab

What am I missing?

Answered by keewis

Jan 12, 2025

this is normal, you need to call ds.load() or ds.compute() to get the same result: open_dataset only reads the metadata of a file into memory, which allows opening files that are too big to fit into memory. As a second step you can (implicitly or explicitly) load the data into memory (or use dask to allow computing with larger-than-memory arrays).

View full answer

keewis · 2025-01-12T12:40:44Z

keewis
Jan 12, 2025
Maintainer

this is normal, you need to call ds.load() or ds.compute() to get the same result: open_dataset only reads the metadata of a file into memory, which allows opening files that are too big to fit into memory. As a second step you can (implicitly or explicitly) load the data into memory (or use dask to allow computing with larger-than-memory arrays).

1 reply

pyvelepor Jan 12, 2025
Author

Thanks! And I completely missed there's xarray.load_dataset as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to implement custom backend without explicit variable decoding #9942

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to implement custom backend without explicit variable decoding #9942

Uh oh!

pyvelepor Jan 12, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

keewis Jan 12, 2025 Maintainer

Uh oh!

pyvelepor Jan 12, 2025 Author

pyvelepor
Jan 12, 2025

Replies: 1 comment 1 reply

keewis
Jan 12, 2025
Maintainer

pyvelepor Jan 12, 2025
Author