Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle fill_value for bytes dtype #2208

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/zarr/abc/metadata.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import base64
from collections.abc import Sequence
from typing import TYPE_CHECKING

Expand Down Expand Up @@ -29,6 +30,8 @@ def to_dict(self) -> dict[str, JSON]:
value = getattr(self, key)
if isinstance(value, Metadata):
out_dict[field.name] = getattr(self, field.name).to_dict()
elif isinstance(value, bytes):
out_dict[key] = base64.b64encode(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we can't assume bytes are valid JSON so they must be transformed somehow. base64 encoding seems as good as anything else.

We'll need to update the from_dict to do the decoding. We probably need to see that we're decoding to a bytes dtype and use that when parsing the fill value.

elif isinstance(value, str):
out_dict[key] = value
elif isinstance(value, Sequence):
Expand Down
2 changes: 1 addition & 1 deletion src/zarr/core/metadata/v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ def parse_fill_value(
"""
if fill_value is None:
return dtype.type(0)
if isinstance(fill_value, Sequence) and not isinstance(fill_value, str):
if isinstance(fill_value, Sequence) and not isinstance(fill_value, str | bytes):
if dtype in (np.complex64, np.complex128):
dtype = cast(COMPLEX_DTYPE, dtype)
if len(fill_value) == 2:
Expand Down
27 changes: 27 additions & 0 deletions tests/v3/test_metadata/test_v3.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from __future__ import annotations

import dataclasses
import json
import re
from typing import TYPE_CHECKING, Literal

from zarr.codecs.bytes import BytesCodec
from zarr.core.buffer import default_buffer_prototype
from zarr.core.chunk_grids import RegularChunkGrid
from zarr.core.chunk_key_encodings import DefaultChunkKeyEncoding, V2ChunkKeyEncoding
from zarr.core.metadata.v3 import ArrayV3Metadata

Expand Down Expand Up @@ -165,6 +167,31 @@ def test_parse_fill_value_invalid_type_sequence(fill_value: Any, dtype_str: str)
parse_fill_value(fill_value, dtype)


def test_parse_fill_value_bytes():
result = parse_fill_value("", dtype=np.dtype("S6"))
assert result == np.bytes_("")


@pytest.mark.parametrize("fill_value", [None, np.bytes_(b"")])
def test_fill_value_bytes(fill_value: Any) -> None:
md = ArrayV3Metadata(
shape=(4,),
data_type=np.dtype("S6"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 - we still don't have a string dtype in the spec! In #2209, I implement a dtype validator that would raise an error if this was passed in.

fill_value=fill_value,
chunk_grid=RegularChunkGrid(chunk_shape=(2,)),
chunk_key_encoding=DefaultChunkKeyEncoding(),
codecs=(),
attributes={},
dimension_names=("a",),
)
assert md.fill_value == np.bytes_(b"")
assert md.dtype == np.dtype("S6")
# regression test for creating a new metadata from default values
dataclasses.replace(md)
serialized = md.to_dict()
assert serialized


@pytest.mark.parametrize("chunk_grid", ["regular"])
@pytest.mark.parametrize("attributes", [None, {"foo": "bar"}])
@pytest.mark.parametrize("codecs", [[BytesCodec()]])
Expand Down
Loading