-
-
Notifications
You must be signed in to change notification settings - Fork 379
Description
When trying to register a codec with the same name as a numcodecs codec (e.g., to override the behavior of that codec), the new codec cannot be retrieved from the registry.
here's a reproducer. The expected behavior is that, after registering my codec under a key, I should be able to pull it out under the same key. That fails, and the (broken) numcodecs codec is retrieved instead, which kind of defeats the entire purpose of a dynamic registry.
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
from numcodecs.zarr3 import Delta
import numpy as np
import zarr
from zarr.core.array_spec import ArraySpec
from zarr.dtype import parse_data_type
from zarr.registry import register_codec, get_codec_class
from dataclasses import replace
class PatchedDelta(Delta):
def resolve_metadata(self, chunk_spec: ArraySpec) -> ArraySpec:
if astype := self.codec_config.get("astype"):
return replace(chunk_spec, dtype=parse_data_type(astype, zarr_format=3)) # type: ignore[call-overload]
return chunk_spec
key = 'numcodecs.delta'
value = PatchedDelta
register_codec(key, value)
print(get_codec_class(key))
# <class 'numcodecs.zarr3.Delta'>
assert get_codec_class(key) == valueThe underlying problem stems from the design of the codec registry: the same codec name can be associated with multiple codecs, and in this case, a codec is retrieved in FILO order? The end result is that a numcodecs codec always wins, and the user-defined codec can never be retrieved:
zarr-python/src/zarr/registry.py
Lines 156 to 164 in 9ae874c
| config_entry = config.get("codecs", {}).get(key) | |
| if config_entry is None: | |
| if len(codec_classes) == 1: | |
| return next(iter(codec_classes.values())) | |
| warnings.warn( | |
| f"Codec '{key}' not configured in config. Selecting any implementation.", | |
| stacklevel=2, | |
| ) | |
| return list(codec_classes.values())[-1] |
I think the logic here was to support e.g. CPU and GPU implementations for the same codec, but our current solution is bad. The registry should be a 1:1 mapping between identifiers and codecs. If we need to support CPU codecs and GPU codecs, then we have 2 registries, not a single broken registry.