WIP: Make `DictStore` the default `Array` store #351

jakirkham · 2018-12-03T16:48:00Z

Based on discussion in issue ( #349 ), it sounds like we are ok moving to DictStore to back Arrays. This makes that change.

As the change depends somewhat on PR ( #350 ), this has been marked as WIP until we resolve that one. Can revisit afterwards.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
Docs build locally (e.g., run tox -e docs)
AppVeyor and Travis CI passes
Test coverage is 100% (Coveralls passes)

jakirkham · 2018-12-03T17:02:46Z

Something to consider here is what we do when a different in-memory MutableMapping is provided (e.g. dict or OrderedDict). One thought is that we construct a new DictStore and copy the data over. This should catch any non-conforming cases and ensure that everything conforms to the spec afterwards.

Another thing to consider is a copy was introduced in commit ( 3c00d52 ) to ensure the underlying data of an in-memory store is not mutated. However if we always use a DictStore for in-memory data and that store always guarantees to contain immutable blobs of data ( e.g. bytes as in PR #350 ), then we can drop this copy as it will be taken care of for us.

This latter thought re-raises the point that we should allow bytes passed through by filters/compressors to remain untouched so as to avoid a copy later.

alimanfoo · 2018-12-03T18:13:36Z

Something to consider here is what we do when a different in-memory MutableMapping is provided (e.g. dict or OrderedDict). One thought is that we construct a new DictStore and copy the data over. This should catch any non-conforming cases and ensure that everything conforms to the spec afterwards.

Not sure what to do in that case, need to mull it over. Might be a bit odd for the user if we ended up copying data across and using a different object as the store, i.e., user would be expecting to see data in the object they provided as store, but we ended up using something else. The alternative would be to just try and use whatever object is provided as a store. If it doesn't fully conform to the spec (e.g., plain dict), so be it, the user may get unexpected behaviour.

Another thing to consider is a copy was introduced in commit ( 3c00d52 <3c00d52> ) to ensure the underlying data of an in-memory store is not mutated. However if we always use a DictStore for in-memory data and that store always guarantees to contain immutable blobs of data ( e.g. bytes as in PR #350 <#350> ), then we can drop this copy as it will be taken care of for us.

Agree we could probably drop that copy if we know that DictStore will be ensuring bytes.

This latter thought re-raises the point <#348 (comment)> that we should allow bytes passed through by filters/compressors to remain untouched so as to avoid a copy later.

+1 to restore bytes output from compression codecs.

jakirkham · 2018-12-03T22:00:02Z

Good point. Some random thoughts.

What if we had a generic wrapper class for MutableMappings that exposed a MutableMapping API overriding a couple function to ensure they behaved safely (e.g. __setitem__)?
What if we merely enforce that Array provides bytes to the store?
Should we validate input stores meet the spec somehow?

alimanfoo · 2018-12-03T22:19:36Z

My 2c...

What if we had a generic wrapper class for MutableMappings that exposed a MutableMapping API overriding a couple function to ensure they behaved safely (e.g. __setitem__)?

I'm not sure it's necessary. E.g., we already normalise storage paths within the core and hierarchy modules before we ever interact with the storage layer, so we can be sure we'll never send an invalid key to a store.

What if we merely enforce that Array provides bytes to the store?

I'm not sure it's necessary to be so strict. Although I suppose that we could validate that whatever value comes out the end of the chunk encoding pipeline is an object that supports the buffer protocol, and raise some kind of encoding error if not, prior to sending the value to the storage layer.

Should we validate input stores meet the spec somehow?

Personally I think it would be good to provide some support for developers to test their storage class complies with the spec. But at runtime, let people provide whatever type of object they choose to provide as a store, and hope it quacks like the right kind of duck.

alimanfoo · 2018-12-03T22:21:00Z

One other thought, I wondered if it might be worth renaming DictStore to MemoryStore so it was more obvious to new users what kind of storage this was (leaving DictStore as an alias for backwards-compatibility).

jakirkham · 2018-12-03T22:26:33Z

One other thought, I wondered if it might be worth renaming DictStore to MemoryStore so it was more obvious to new users what kind of storage this was (leaving DictStore as an alias for backwards-compatibility).

+1 Could also do MemStore if we want to be succinct.

Raised as issue ( #356 ).

jakirkham · 2018-12-07T15:46:16Z

This should be ready for a closer look now that PR ( #350 ) is in.

jakirkham · 2018-12-07T19:50:57Z

There are still some test cases that use dict-based chunk storage for Zarr Arrays. So have added a workaround at the Zarr Array level to special case dict handling.

Though I'm not sure if we shouldn't just go for those two changes alone to fix this issue. Have pulled them out in PR ( #359 ) just in case.

Edit: To be clear, the dict-based chunk store workaround is no longer included in this PR.

jakirkham · 2018-12-15T23:29:00Z

We can certainly go ahead with this as discussed. Though we may also want to consider adding a workaround for dict to ensure immutable values as done in PR ( #359 ) either in addition to this PR or instead of it.

jakirkham · 2018-12-16T02:23:31Z

Added to the v2.3 milestone just to track it. Happy to change this as needed.

Instead of using a Python `dict` as the `default` store for a Zarr `Array`, use the `DictStore`. This ensures that all blobs will be represented as `bytes` regardless of what the user provided as data. Thus things like comparisons of stores will work well in the default case.

As we are now using `DictStore` to back the `Array`, we can correctly measure how much memory it is using. So update the examples in `info` and the tutorial to show how much memory is being used. Also update the store type listed in info as well.

As we prefer to use the better behaved `DictStore`, raise an error if `dict` is used. Should also help us smoke out where in our tests `dict` is used and change it.

jakirkham · 2019-02-16T01:52:01Z

Have pushed some changes that may be considered breaking. So have removed it from the milestone for now.

As `dict` stores are not supported in this changeset, there is no need for this specific workaround for them. Given this go ahead drop this workaround.

joshmoore · 2021-11-23T13:57:07Z

@jakirkham @grlee77 : is it fair to say this has been superceded?

jakirkham · 2021-11-29T10:22:48Z

Yeah I think so. We can always revisit if needed (likely we would need a new PR at this point)

Edit: For context PR ( #789 ) addressed this.

jakirkham mentioned this pull request Dec 3, 2018

Requirements of store data #349

Closed

jakirkham mentioned this pull request Dec 4, 2018

Bump Numcodecs requirement to 0.6.1 #347

Closed

7 tasks

jakirkham changed the title ~~WIP: Make DictStore the default Array store~~ Make DictStore the default Array store Dec 7, 2018

jakirkham mentioned this pull request Dec 15, 2018

Converting sparse matrices directly to persistent zarr arrays #152

Closed

jakirkham requested a review from alimanfoo December 15, 2018 23:27

jakirkham added this to the v2.3 milestone Dec 16, 2018

jakirkham added 3 commits February 15, 2019 20:14

Update DictStore docs to note Array uses it

1b0930f

Update Array's info examples

fdced9e

As we are now using `DictStore` to back the `Array`, we can correctly measure how much memory it is using. So update the examples in `info` and the tutorial to show how much memory is being used. Also update the store type listed in info as well.

jakirkham changed the title ~~Make DictStore the default Array store~~ WIP: Make DictStore the default Array store Feb 16, 2019

jakirkham removed this from the v2.3 milestone Feb 16, 2019

jakirkham added 4 commits February 15, 2019 20:39

Raise if Array's store is the builtin Python dict

6e8e58b

As we prefer to use the better behaved `DictStore`, raise an error if `dict` is used. Should also help us smoke out where in our tests `dict` is used and change it.

Raise if Group's store is the builtin Python dict

3d7329f

As we prefer to use the better behaved `DictStore`, raise an error if `dict` is used. Should also help us smoke out where in our tests `dict` is used and change it.

Use DictStore in Array tests

e596c4f

Use DictStore in Group tests

c53a995

jakirkham added 4 commits February 16, 2019 18:02

Drop ensure_bytes line for dict stores

732427c

As `dict` stores are not supported in this changeset, there is no need for this specific workaround for them. Given this go ahead drop this workaround.

Use DictStore in synchronization tests

2fba546

Drop unsupported test

4dedeba

Test custom chunk store to determine size

819b9ad

jakirkham added 2 commits February 16, 2019 21:41

Update example in create to use DictStore

d83a982

Use DictStore in info test

87af9ca

jakirkham closed this Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Make `DictStore` the default `Array` store #351

WIP: Make `DictStore` the default `Array` store #351

Uh oh!

jakirkham commented Dec 3, 2018 •

edited

Loading

Uh oh!

jakirkham commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018 via email

Uh oh!

jakirkham commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018

Uh oh!

jakirkham commented Dec 3, 2018 •

edited

Loading

Uh oh!

jakirkham commented Dec 7, 2018 •

edited

Loading

Uh oh!

jakirkham commented Dec 7, 2018 •

edited

Loading

Uh oh!

jakirkham commented Dec 15, 2018

Uh oh!

jakirkham commented Dec 16, 2018

Uh oh!

jakirkham commented Feb 16, 2019

Uh oh!

joshmoore commented Nov 23, 2021

Uh oh!

jakirkham commented Nov 29, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

WIP: Make DictStore the default Array store #351

WIP: Make DictStore the default Array store #351

Uh oh!

Conversation

jakirkham commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018 via email

Uh oh!

jakirkham commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018

Uh oh!

alimanfoo commented Dec 3, 2018

Uh oh!

jakirkham commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Dec 15, 2018

Uh oh!

jakirkham commented Dec 16, 2018

Uh oh!

jakirkham commented Feb 16, 2019

Uh oh!

joshmoore commented Nov 23, 2021

Uh oh!

jakirkham commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

WIP: Make `DictStore` the default `Array` store #351

WIP: Make `DictStore` the default `Array` store #351

jakirkham commented Dec 3, 2018 •

edited

Loading

jakirkham commented Dec 3, 2018 •

edited

Loading

jakirkham commented Dec 7, 2018 •

edited

Loading

jakirkham commented Dec 7, 2018 •

edited

Loading

jakirkham commented Nov 29, 2021 •

edited

Loading