Skip to content

Commit

Permalink
Append to icechunk stores (#272)
Browse files Browse the repository at this point in the history
* Initial attempt at appending

* Working on tests for generate chunk key function

* Linting

* Refactor gen virtual dataset method

* Fix spelling

* Linting

* Linting

* Linting

* Passing compression test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* linting

* Fix test failing due to incorrect dtype

* linting

* Linting

* Remove obsolete test file for appending

* Create netcdf4 files factor in conftest

* Linting

* Refactor to use combineable zarr arrays

* linting

* Implement no append dim test

* Add test for when append dim is not in dims

* Fix mypy errors

* type ignore import untyped zarr

* Use Union type for check_combineable_zarr_arrays arg

* Fix import

* Fix imports for get_codecs

* use new factory in test

* Remove need for dask in fixture

* Fix for when zarr is not installed

* Address test failures

* Add get_codecs file

* Add dask to upstream

* Remove dependency on dask and h5netcdf engine

* Remove obsolete comment

* Remove duplicate zarr array type check

* Move codecs module and type output

* Actually add codecs file

* Fix merge mistake

* Ignore import untyped

* Add tests for codecs

* Resolve mypy errors

* Fix test

* Import zarr in function

* Use existing importorskip function

* Modify comments

* Comment updates and spelling of combinable

* Revert change to check compatible encoding

* Ignore zarr untyped import errors

* Implement a manifest.utils module

* pass the array into resize_array

Co-authored-by: Tom Nicholas <tom@cworthy.org>

* Refactor resize_array

* Remove unnecessary zarr imports

* Add pinned version of icechunk as an optional dependency

* Add append_dim in docstring

* Kludgy solution to v2 v3 codecs difference

* Add normalize to v3 parameter

* Add more info to docstring

* Fix typing issues

* Add decorator for zarr python v3 test

* Fix mypy and ruff errors

* Only append if append_dim in dims

* Add example notebook

* Add a runtime

* Add failing test

* Fix multiple appends

* Fix test error message

* Add new cell to notebook to display original time chunk

* Upgrade icechunk to 1.0.0a5

* Upgrade icechunk in upstream.yml

* Updated notebook with kechunk comment an upgraded icechunk version

* Modify test so it fails without updated icechunk

* Update icechunk dependency

* Fix mypy errors

* update icechunk version in pyproject

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove obsolete comment

* Use icechunk 0.1.0a7

* Updated notebook

* Updated notebook

* print store

* Update notebook (#327)

Co-authored-by: Aimee Barciauskas <aimee@developmentseed.org>

* Add append to examples

* Add to releases.rst

* Revert change to .gitignore

* Update ci/upstream.yml

Co-authored-by: Tom Nicholas <tom@cworthy.org>

* Update pyproject.toml

Co-authored-by: Tom Nicholas <tom@cworthy.org>

* Update virtualizarr/tests/test_writers/test_icechunk.py

Co-authored-by: Tom Nicholas <tom@cworthy.org>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update virtualizarr/accessor.py

Co-authored-by: Tom Nicholas <tom@cworthy.org>

* Separate out multiple arrays test

---------

Co-authored-by: Tom Nicholas <tom@cworthy.org>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthew Iannucci <matthew@earthmover.io>
  • Loading branch information
4 people authored Dec 5, 2024
1 parent 20dd9dc commit 4d85a03
Show file tree
Hide file tree
Showing 15 changed files with 2,412 additions and 178 deletions.
6 changes: 3 additions & 3 deletions ci/upstream.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,6 @@ dependencies:
- fsspec
- pip
- pip:
- icechunk # Installs zarr v3 as dependency
# - git+https://github.com/fsspec/kerchunk@main # kerchunk is currently incompatible with zarr-python v3 (https://github.com/fsspec/kerchunk/pull/516)
- imagecodecs-numcodecs==2024.6.1
- icechunk>=0.1.0a7 # Installs zarr v3 as dependency
# - git+https://github.com/fsspec/kerchunk@main # kerchunk is currently incompatible with zarr-python v3 (https://github.com/fsspec/kerchunk/pull/516)
- imagecodecs-numcodecs==2024.6.1
48 changes: 28 additions & 20 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from typing import Any, Dict, Optional

import h5py
import numpy as np
import pytest
Expand Down Expand Up @@ -35,6 +37,32 @@ def netcdf4_file(tmpdir):
return filepath


@pytest.fixture
def netcdf4_files_factory(tmpdir) -> callable:
def create_netcdf4_files(
encoding: Optional[Dict[str, Dict[str, Any]]] = None,
) -> tuple[str, str]:
ds = xr.tutorial.open_dataset("air_temperature")

# Split dataset into two parts
ds1 = ds.isel(time=slice(None, 1460))
ds2 = ds.isel(time=slice(1460, None))

# Save datasets to disk as NetCDF in the temporary directory with the provided encoding
filepath1 = f"{tmpdir}/air1.nc"
filepath2 = f"{tmpdir}/air2.nc"
ds1.to_netcdf(filepath1, encoding=encoding)
ds2.to_netcdf(filepath2, encoding=encoding)

# Close datasets
ds1.close()
ds2.close()

return filepath1, filepath2

return create_netcdf4_files


@pytest.fixture
def netcdf4_file_with_2d_coords(tmpdir):
ds = xr.tutorial.open_dataset("ROMS_example")
Expand Down Expand Up @@ -71,26 +99,6 @@ def hdf5_groups_file(tmpdir):
return filepath


@pytest.fixture
def netcdf4_files(tmpdir):
# Set up example xarray dataset
ds = xr.tutorial.open_dataset("air_temperature")

# split inrto equal chunks so we can concatenate them back together later
ds1 = ds.isel(time=slice(None, 1460))
ds2 = ds.isel(time=slice(1460, None))

# Save it to disk as netCDF (in temporary directory)
filepath1 = f"{tmpdir}/air1.nc"
filepath2 = f"{tmpdir}/air2.nc"
ds1.to_netcdf(filepath1)
ds2.to_netcdf(filepath2)
ds1.close()
ds2.close()

return filepath1, filepath2


@pytest.fixture
def hdf5_empty(tmpdir):
filepath = f"{tmpdir}/empty.nc"
Expand Down
1 change: 1 addition & 0 deletions docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ New Features

- Add a ``virtual_backend_kwargs`` keyword argument to file readers and to ``open_virtual_dataset``, to allow reader-specific options to be passed down.
(:pull:`315`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Added append functionality to `to_icechunk` (:pull:`272`) By `Aimee Barciauskas <https://github.com/abarciauskas-bgse>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit 4d85a03

Please sign in to comment.