-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Make DictStore
the default Array
store
#351
WIP: Make DictStore
the default Array
store
#351
Conversation
Something to consider here is what we do when a different in-memory Another thing to consider is a copy was introduced in commit ( 3c00d52 ) to ensure the underlying data of an in-memory store is not mutated. However if we always use a This latter thought re-raises the point that we should allow |
Something to consider here is what we do when a different in-memory
MutableMapping is provided (e.g. dict or OrderedDict). One thought is
that we construct a new DictStore and copy the data over. This should
catch any non-conforming cases and ensure that everything conforms to the
spec afterwards.
Not sure what to do in that case, need to mull it over. Might be a bit odd
for the user if we ended up copying data across and using a different
object as the store, i.e., user would be expecting to see data in the
object they provided as store, but we ended up using something else.
The alternative would be to just try and use whatever object is provided as
a store. If it doesn't fully conform to the spec (e.g., plain dict), so be
it, the user may get unexpected behaviour.
Another thing to consider is a copy was introduced in commit ( 3c00d52
<3c00d52>
) to ensure the underlying data of an in-memory store is not mutated.
However if we always use a DictStore for in-memory data and that store
always guarantees to contain immutable blobs of data ( e.g. bytes as in
PR #350 <#350> ), then we can
drop this copy as it will be taken care of for us.
Agree we could probably drop that copy if we know that DictStore will be
ensuring bytes.
This latter thought re-raises the point
<#348 (comment)>
that we should allow bytes passed through by filters/compressors to
remain untouched so as to avoid a copy later.
+1 to restore bytes output from compression codecs.
|
Good point. Some random thoughts.
|
My 2c...
I'm not sure it's necessary. E.g., we already normalise storage paths within the core and hierarchy modules before we ever interact with the storage layer, so we can be sure we'll never send an invalid key to a store.
I'm not sure it's necessary to be so strict. Although I suppose that we could validate that whatever value comes out the end of the chunk encoding pipeline is an object that supports the buffer protocol, and raise some kind of encoding error if not, prior to sending the value to the storage layer.
Personally I think it would be good to provide some support for developers to test their storage class complies with the spec. But at runtime, let people provide whatever type of object they choose to provide as a store, and hope it quacks like the right kind of duck. |
One other thought, I wondered if it might be worth renaming |
+1 Could also do Raised as issue ( #356 ). |
DictStore
the default Array
storeDictStore
the default Array
store
This should be ready for a closer look now that PR ( #350 ) is in. |
There are still some test cases that use Though I'm not sure if we shouldn't just go for those two changes alone to fix this issue. Have pulled them out in PR ( #359 ) just in case. Edit: To be clear, the |
We can certainly go ahead with this as discussed. Though we may also want to consider adding a workaround for |
Added to the v2.3 milestone just to track it. Happy to change this as needed. |
Instead of using a Python `dict` as the `default` store for a Zarr `Array`, use the `DictStore`. This ensures that all blobs will be represented as `bytes` regardless of what the user provided as data. Thus things like comparisons of stores will work well in the default case.
As we are now using `DictStore` to back the `Array`, we can correctly measure how much memory it is using. So update the examples in `info` and the tutorial to show how much memory is being used. Also update the store type listed in info as well.
DictStore
the default Array
storeDictStore
the default Array
store
As we prefer to use the better behaved `DictStore`, raise an error if `dict` is used. Should also help us smoke out where in our tests `dict` is used and change it.
As we prefer to use the better behaved `DictStore`, raise an error if `dict` is used. Should also help us smoke out where in our tests `dict` is used and change it.
Have pushed some changes that may be considered breaking. So have removed it from the milestone for now. |
As `dict` stores are not supported in this changeset, there is no need for this specific workaround for them. Given this go ahead drop this workaround.
@jakirkham @grlee77 : is it fair to say this has been superceded? |
Yeah I think so. We can always revisit if needed (likely we would need a new PR at this point) Edit: For context PR ( #789 ) addressed this. |
Based on discussion in issue ( #349 ), it sounds like we are ok moving to
DictStore
to backArray
s. This makes that change.As the change depends somewhat on PR ( #350 ), this has been marked as WIP until we resolve that one. Can revisit afterwards.
TODO:
tox -e docs
)