Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): fall back to pure python indexing in case of unhandled rust indexing for read #30

Merged
merged 27 commits into from
Nov 14, 2024

Conversation

ilan-gold
Copy link
Owner

There are a lot of failures here, and none of them come from numpy, which supports all of these indexing operations. This may be a bit out of scope....

Base automatically changed from ig/refactoring to ld/codec_pipeline November 8, 2024 02:56
@LDeakin
Copy link
Collaborator

LDeakin commented Nov 8, 2024

@ilan-gold, you are a testing wizard.

There are a few extra failures here compared to zarr-python, but at least they are being picked up as errors. I am inclined to just add a fallback at some point for the more obscure indexing, where the entire chunk is decoded then indexed on the Python side.

@ilan-gold
Copy link
Owner Author

There are a few extra failures here compared to zarr-python, but at least they are being picked up as errors. I am inclined to just add a fallback at some point for the more obscure indexing, where the entire chunk is decoded then indexed on the Python side.

100%

@ilan-gold
Copy link
Owner Author

Will take care of that here then @LDeakin

@ilan-gold ilan-gold changed the title (chore): parametrize tests to get full scope of possibilities (feat): fall back to pure python in case of unhandled rust indexing Nov 8, 2024
@ilan-gold ilan-gold changed the title (feat): fall back to pure python in case of unhandled rust indexing (feat): fall back to pure python indexing in case of unhandled rust indexing Nov 8, 2024
@ilan-gold ilan-gold changed the title (feat): fall back to pure python indexing in case of unhandled rust indexing (feat): fall back to pure python indexing in case of unhandled rust indexing for read Nov 8, 2024
@ilan-gold
Copy link
Owner Author

Aside from the things that zarr-python doesn't appear to support, we are now down to discontinuous writes with this PR.
https://github.com/ilan-gold/zarrs-python/pull/30/files#diff-e52e4ddd58b7ef887ab03c04116e676f6280b824ab7469d5d3080e5cba4f2128R66-R106 should make it clear what we don't support and why.

I think writing can come in a separate PR. Ideally the solution for both of these is integer indexing in zarrs but for a first pass I think this is fine.

src/lib.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@LDeakin LDeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@ilan-gold
Copy link
Owner Author

The testing here is a bit broken, some stuff got lost in the refactor.

@ilan-gold
Copy link
Owner Author

ilan-gold commented Nov 12, 2024

Ok the issue is that we need to use https://numpy.org/doc/stable/reference/generated/numpy.broadcast_shapes.html to distinguish between cases we can handle and not i.e., input and output must have the same dimensionality upon selection.

@ilan-gold
Copy link
Owner Author

ilan-gold commented Nov 12, 2024

I tried extending our use-case here to 3d (hence the bigger changes than just xfailing the one previous failing test: deba69e) to ensure that things really work and that my checks are functional, but that really only exacerbated the "do things work" in zarr problem: zarr-developers/zarr-python#2485.

I am tempted to just say "2D works for everything except for the explcitly skipped tests, and >2D only works with slicing" or something to this effect.

tests/conftest.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Outdated Show resolved Hide resolved
python/zarrs_python/utils.py Show resolved Hide resolved
@LDeakin LDeakin merged commit 3eedcaf into ld/codec_pipeline Nov 14, 2024
1 check passed
@LDeakin LDeakin deleted the ig/test_full_indexing branch November 14, 2024 20:23
LDeakin added a commit that referenced this pull request Nov 15, 2024
…ndexing for read (#30)

* (chore): file structure

* (chore): parametrize tests to get full scope of possibilities

* (chore): xfail tests that fail on zarr-python default pipeline

* (fix) singular

* (fix): check for contiguous index arrays

* (fix): contiguous numpy arrays converted to slices

* (feat): add reading for non-contiguous buffers

* (chore): remove unused imports

* (fix): cleanup unwraps in `retrieve_chunks`

* Refactor full indexing (#34)

* (chore): `make_chunk_info_for_rust` cleanup

* (fix): all tests working except "tests/test_pipeline.py::test_roundtrip[vindex-contiguous_in_chunk_array-contiguous_in_chunk_array]"

* (fix): skip read in `store_chunk_subset_bytes` for full chunks

* (fix): improve dropped index detection + disallow integer write case

* (chore): message more specific

* (fix): use `Exception`

* (chore): erroneous comment

* (chore): `drop_axes` default

* (chore): `drop_axes` param

* (chore): apply review

* (chore): `else` branch

* (chore): add basic nd tests (#35)

* (chore): add basic 3d tests

* (refactor): use `pytest_generate_tests`

* (fix): clarify collapsed dimension behavior

* (chore): clean ups

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>
Co-authored-by: Philipp A. <flying-sheep@web.de>
ilan-gold added a commit that referenced this pull request Nov 16, 2024
* Add a `CodecPipeline` stub

* Pass chunk spec

* (fix): return bytes as numpy array (#18)

* (fix): return bytes as numpy array

* (fix): reshape after making view

* (fix): remove unused code

* (fix): error handling in `get_store_and_path`

* (fix): change `example/simple.py` to a 2D array

* (fix): pass value to `store_chunk_subset`

* (fix): clippy warnings

* (fix): handle missing chunks in `retrieve_chunk_subset` and cleanup

* (fix): panics to errors in `get_store_and_path`

* (fix): add partial reads/writes to `simply.py`

* (fix): minimum working codec pipeline

- Add internal `get_chunk_representation`, `retrieve_chunk_bytes`, and `store_chunk_bytes`
- Separate `retrieve_chunk` and `retrieve_chunk_subset`
- Separate `store_chunk` and `store_chunk_subset`
- Add assertions to simple.py

* (fix): handle relative filesystem paths

* (fix): minimal error handling and fix clippy warnings

* (fix): handle relative filesystem paths take 2

* (fix): constant handling

* (fix): add `retrieve_chunks` for parallel read

- Add config options
- CodecPipelineImpl interior mutability

* (fix): add `store_chunks` for parallel write

* (fix) convert write value to contiguous array if needed

* (fix): bump zarrs to 21e86cb9

* CI for codec pipeline (#20)

* (fix): ci for running tests

* (fix): no need to extract tests

* (Fix): remove duplicate name

* (fix): use a submodule...

* (chore): remove memory store + port zarr codec tests

* (chore): remove `dlpark`

* (fix): getattr for itemsize

* (chore): remove runtime F ordering

* (chore): skip vlen

* (feat): parse int

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>

* (fix): support writing arrays with non-native endianness

* (fix): disable bad/unsupported invalid metadata tests

* (fix): do not store empty chunks

* (fix) remove dead code in codec pipeline

* (fix): move some selection logic from Rust to Python

* (feat): `chunks_desc` handling + concurrency via `zarr.config` (#21)

* (fix): minimum working codec pipeline

- Add internal `get_chunk_representation`, `retrieve_chunk_bytes`, and `store_chunk_bytes`
- Separate `retrieve_chunk` and `retrieve_chunk_subset`
- Separate `store_chunk` and `store_chunk_subset`
- Add assertions to simple.py

* (fix): handle relative filesystem paths

* (fix): minimal error handling and fix clippy warnings

* (fix): handle relative filesystem paths take 2

* (fix): constant handling

* (fix): add `retrieve_chunks` for parallel read

- Add config options
- CodecPipelineImpl interior mutability

* (fix): add `store_chunks` for parallel write

* (fix) convert write value to contiguous array if needed

* (fix): bump zarrs to 21e86cb9

* CI for codec pipeline (#20)

* (fix): ci for running tests

* (fix): no need to extract tests

* (Fix): remove duplicate name

* (fix): use a submodule...

* (chore): remove memory store + port zarr codec tests

* (chore): remove `dlpark`

* (fix): getattr for itemsize

* (chore): remove runtime F ordering

* (chore): skip vlen

* (feat): parse int

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>

* (fix): support writing arrays with non-native endianness

* (fix): disable bad/unsupported invalid metadata tests

* (fix): do not store empty chunks

* (fix) remove dead code in codec pipeline

* (fix): move some selection logic from Rust to Python

* (chore): `chunks_desc` cleanup

* (feat): adding concurrency via zarr config

* (chore): remove extra comment

* (fix): refactor chunk info creation + threads->threading + ruff

* (fix): use `or` for `threading.max_workers` getting

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>

* (chore): pipeline-specific tests (#23)

* (chore): add pipeline-specific tests

* (chore): add more tests, including weird zarr one

* (fix): arr shape used in test

* (chore): change names

* (chore): remove unnecessary fill_value checking

* (chore): make roundtrip_full_array shorter

* (fix): handle singleton axis indexing

Fixes `test_roundtrip_singleton_axis` test

* (fix): deprecated numpy array to scalar in `make_slice_selection`

Fixes the `test_roundtrip_partial` test

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>

* (chore): add orthogonal indexing tests (#24)

* (chore): add orthogonal indexing tests

* (fix): oindex roundtrip

* (fix): dont skip test because it works actually

* (fix): clean up returning

* Delete tests/__init__.py

* (chore): add ellipsis indexing tests (#25)

* Format (#26)

* Pyo3-0.22 (#29)

* Refactor and deduplicate ChunksItem (#31)

* Improve error mapping code (#28)

Co-authored-by: ilan-gold <ilanbassgold@gmail.com>

* (chore): add `pyarray_itemsize()`

* (chore): add safety comments to `ndarray_to_*` methods

Also add `py_untyped_array_to_array_object`

* (chore): label fields in `ChunksItemRaw`

* (fix): fix store/retrieve with scalars + refactor

- Fixed fill value bytes being larger than needed when storing
- Handle reduced dimensionality inputs/outputs
  - Uses LDeakin/zarrs@8c1391f
  - This inconsistency was picked up by `zarrs` in debug build

* (fix): address several unwraps

* (chore): cleanup `slice_to_range`

* (chore): change `retrieve_chunk_bytes` to return `ArrayBytes`

* (chore): add SAFETY docs and `store_chunk_subset_bytes` input validation

* (chore): file structure (#27)

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>

* (feat): refactor `CodecPipelineStore` to a trait

- Move filesystem store implementation to a submodule

* (fix): bring back filesystem relative path support

Broken with 4c1ce39

* (fix): bump `requires-python` to 3.11

This is the minimum set by `zarr` 3.0.0b1

* Rust tests (#33)

* Simplify shape calculation

* (chore): add rust-cache to CI and use the standard uv action (#32)

* (fix): use the standard uv GH action

* cache with pyproject.toml

* ci: remove install rust

* (chore): add rust-cache to CI

* (fix): add note in CI about rust-toolchain action

* (feat): fall back to pure python indexing in case of unhandled rust indexing for read (#30)

* (chore): file structure

* (chore): parametrize tests to get full scope of possibilities

* (chore): xfail tests that fail on zarr-python default pipeline

* (fix) singular

* (fix): check for contiguous index arrays

* (fix): contiguous numpy arrays converted to slices

* (feat): add reading for non-contiguous buffers

* (chore): remove unused imports

* (fix): cleanup unwraps in `retrieve_chunks`

* Refactor full indexing (#34)

* (chore): `make_chunk_info_for_rust` cleanup

* (fix): all tests working except "tests/test_pipeline.py::test_roundtrip[vindex-contiguous_in_chunk_array-contiguous_in_chunk_array]"

* (fix): skip read in `store_chunk_subset_bytes` for full chunks

* (fix): improve dropped index detection + disallow integer write case

* (chore): message more specific

* (fix): use `Exception`

* (chore): erroneous comment

* (chore): `drop_axes` default

* (chore): `drop_axes` param

* (chore): apply review

* (chore): `else` branch

* (chore): add basic nd tests (#35)

* (chore): add basic 3d tests

* (refactor): use `pytest_generate_tests`

* (fix): clarify collapsed dimension behavior

* (chore): clean ups

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>
Co-authored-by: Philipp A. <flying-sheep@web.de>

* (fix): support zarr 3.0.0b2 + bump zarrs to 0.18.0-beta.0 (#36)

* (fix): support zarr 3.0.0b2

* (fix): open store read_only in test_roundtrip_read_only_zarrs

* (fix): pin zarrs revision

I think an old version is cached?

* (chore): add Cargo.toml to uv dependency glob

* (fix): unquote uv dependency glob

* (chore): bump `zarrs` to 0.18.0-beta.0

* (fix): remove unused rust features and `paste`

* (chore): first pass at docs

* (chore): add `yml`

* (chore): small cleanup in `README.md`

* (chore): add some little examples + integer indexing note

* (chore): more examples

* (chore): link

* Update hatch.toml

Co-authored-by: Philipp A. <flying-sheep@web.de>

* (chore): update package name

* merge

* (fix): add zarrs import to example

* (fix): fix typo in docs

* (fix): remove python/zarrs_python directory

* (fix): clarify what is not supported

* (chore): futher clarify

---------

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>
Co-authored-by: Philipp A. <flying-sheep@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants