Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle fill_value for bytes dtype #2208

Open
wants to merge 1 commit into
base: v3
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Contributor

This partly fixes how we handle bytes dtype arrays. Pushing up early for a bit of a discussion around the spec:

AFAICT, bytes aren't a "required"(?) dtype for zarr v3. From https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#fill-value:

Extensions to the spec that define new data types must also define the JSON fill value representation.

So the proper way to do this would be to

  1. Define a zarr extension for bytes dtype (in this case, fixed-width null terminated bytes). Probably need to do the same for unicode
  2. In the extension, define how the fill value should be encoded in JSON. Presumably as base64 encoded.

@@ -29,6 +30,8 @@ def to_dict(self) -> dict[str, JSON]:
value = getattr(self, key)
if isinstance(value, Metadata):
out_dict[field.name] = getattr(self, field.name).to_dict()
elif isinstance(value, bytes):
out_dict[key] = base64.b64encode(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we can't assume bytes are valid JSON so they must be transformed somehow. base64 encoding seems as good as anything else.

We'll need to update the from_dict to do the decoding. We probably need to see that we're decoding to a bytes dtype and use that when parsing the fill value.

def test_fill_value_bytes(fill_value: Any) -> None:
md = ArrayV3Metadata(
shape=(4,),
data_type=np.dtype("S6"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 - we still don't have a string dtype in the spec! In #2209, I implement a dtype validator that would raise an error if this was passed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants