Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StoreTest has some odd parameterizations #2698

Closed
y4n9squared opened this issue Jan 13, 2025 · 10 comments
Closed

StoreTest has some odd parameterizations #2698

y4n9squared opened this issue Jan 13, 2025 · 10 comments
Labels
bug Potential issues with the zarr-python library

Comments

@y4n9squared
Copy link

Zarr version

3.0.0

Numcodecs version

0.14.1

Python Version

3.13

Operating System

Linux

Installation

Using pip

Description

These parameterizations for test_set and these combinations in test_set_many end up testing for patterns that won't occur: A zarr.json whose contents are arbitrary bytes or empty or contain the string "zarr.json".

I was adapting a custom store from v2 land and hit a few test case failures due to this (during JSON deserialization of the zarr.json file). Any interest in making the pytest parameterizations a bit more realistic? I can take a stab at it if so.

Steps to reproduce

Write a custom store implementation that actually performs JSON deserialization of zarr.json.

Additional output

No response

@y4n9squared y4n9squared added the bug Potential issues with the zarr-python library label Jan 13, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

currently the Store classes are designed to be agnostic to the zarr format -- the stores don't know that a key called zarr.json will encode a zarr metadata document. Given that fact, those test cases make sense.

Out of curiosity, what was the custom store you were trying to adapt? Because the 2.x store base classes were designed in the same way as 3.0 -- they just mapped strings to bytes, without applying any zarr semantics.

@y4n9squared
Copy link
Author

y4n9squared commented Jan 13, 2025

We are using an in-house store implementation that loosely targets some goals of Icechunk (predates Icechunk). I only came across the StoreTest class after going through the release notes. It's possible our class wouldn't have passed the original but we ran a different set of tests (driven by Xarray's to_zarr() implementation).

The implementation is sensitive to whether the key is a chunk or metadata key because the metadata updates are handled specially to ensure read consistency. If you'd like to keep StoreTest agnostic to Zarr format, we can simply override the problematic test cases.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

The implementation is sensitive to whether the key is a chunk or metadata key because the metadata updates are handled specially to ensure read consistency.

Right, in this case our store classes are at a lower abstraction level than this, so we should expect their tests to fail for something higher level. For what it's worth, we have considered making the stores less general and more zarr-format-specific, in part because chunks files and metadata might benefit from different treatment (e.g., separate caches). But I don't think there are active proposals for this right now.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

If you'd like to keep StoreTest agnostic to Zarr format, we can simply override the problematic test cases.

I think overriding those cases is the right course of action here -- since modelling the zarr format is not part of the store design, even if we changed the relevant test cases to use "real" zarr keys and "real" zarr metadata documents, it's possible that we might revert those changes that one day. It's probably safer if you explicitly ensure on your end that the store tests model your store class properly.

@y4n9squared
Copy link
Author

Sounds good. Closing.

@jhamman
Copy link
Member

jhamman commented Jan 13, 2025

I would like to see us update these tests to use valid zarr metadata when writing to these stores. Alternatively, we could move the offending tests to specific stores (e.g. ZipStore).

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

doesn't the rest of the zarr test suite essentially test the ability of the store to read and write valid zarr metadata documents?

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

We have a lot of tests like this:

await store.set("foo/zarr.json", self.buffer_cls.from_bytes(b"bar"))

I think it's important that tests are short and readable. Using b"bar" as the payload, instead of writing out a valid zarr.json document, helps with this. That being said, if it's possible to use valid zarr metadata, but have a comparably concise and readable test case, then I don't see a problem with the idea.

@y4n9squared
Copy link
Author

y4n9squared commented Jan 13, 2025

What I have essentially done on my side is create a constant that is

"""{
  "attributes": {},
  "zarr_format": 3,
  "consolidated_metadata": null,
  "node_type": "group"
}""".encode()

and used that as payloads for zarr.json.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 14, 2025

we will need multiple pieces of data, to ensure that our store routines don't inadvertantly associate multiple keys with the same payload. These could be generated with a function that takes a dict D as a parameter, and it returns encoded zarr v3 group metadata but with the attributes field set to D. I think this would be easy enough to invoke without bloating the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants