Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Make DictStore the default Array store #351

Closed
wants to merge 13 commits into from
Closed

WIP: Make DictStore the default Array store #351

wants to merge 13 commits into from

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Dec 3, 2018

Based on discussion in issue ( #349 ), it sounds like we are ok moving to DictStore to back Arrays. This makes that change.

As the change depends somewhat on PR ( #350 ), this has been marked as WIP until we resolve that one. Can revisit afterwards.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

@jakirkham
Copy link
Member Author

Something to consider here is what we do when a different in-memory MutableMapping is provided (e.g. dict or OrderedDict). One thought is that we construct a new DictStore and copy the data over. This should catch any non-conforming cases and ensure that everything conforms to the spec afterwards.

Another thing to consider is a copy was introduced in commit ( 3c00d52 ) to ensure the underlying data of an in-memory store is not mutated. However if we always use a DictStore for in-memory data and that store always guarantees to contain immutable blobs of data ( e.g. bytes as in PR #350 ), then we can drop this copy as it will be taken care of for us.

This latter thought re-raises the point that we should allow bytes passed through by filters/compressors to remain untouched so as to avoid a copy later.

@alimanfoo
Copy link
Member

alimanfoo commented Dec 3, 2018 via email

@jakirkham
Copy link
Member Author

Good point. Some random thoughts.

  1. What if we had a generic wrapper class for MutableMappings that exposed a MutableMapping API overriding a couple function to ensure they behaved safely (e.g. __setitem__)?
  2. What if we merely enforce that Array provides bytes to the store?
  3. Should we validate input stores meet the spec somehow?

@alimanfoo
Copy link
Member

My 2c...

  1. What if we had a generic wrapper class for MutableMappings that exposed a MutableMapping API overriding a couple function to ensure they behaved safely (e.g. __setitem__)?

I'm not sure it's necessary. E.g., we already normalise storage paths within the core and hierarchy modules before we ever interact with the storage layer, so we can be sure we'll never send an invalid key to a store.

  1. What if we merely enforce that Array provides bytes to the store?

I'm not sure it's necessary to be so strict. Although I suppose that we could validate that whatever value comes out the end of the chunk encoding pipeline is an object that supports the buffer protocol, and raise some kind of encoding error if not, prior to sending the value to the storage layer.

  1. Should we validate input stores meet the spec somehow?

Personally I think it would be good to provide some support for developers to test their storage class complies with the spec. But at runtime, let people provide whatever type of object they choose to provide as a store, and hope it quacks like the right kind of duck.

@alimanfoo
Copy link
Member

One other thought, I wondered if it might be worth renaming DictStore to MemoryStore so it was more obvious to new users what kind of storage this was (leaving DictStore as an alias for backwards-compatibility).

@jakirkham
Copy link
Member Author

jakirkham commented Dec 3, 2018

One other thought, I wondered if it might be worth renaming DictStore to MemoryStore so it was more obvious to new users what kind of storage this was (leaving DictStore as an alias for backwards-compatibility).

+1 Could also do MemStore if we want to be succinct.

Raised as issue ( #356 ).

@jakirkham jakirkham mentioned this pull request Dec 4, 2018
7 tasks
@jakirkham jakirkham changed the title WIP: Make DictStore the default Array store Make DictStore the default Array store Dec 7, 2018
@jakirkham
Copy link
Member Author

jakirkham commented Dec 7, 2018

This should be ready for a closer look now that PR ( #350 ) is in.

@jakirkham
Copy link
Member Author

jakirkham commented Dec 7, 2018

There are still some test cases that use dict-based chunk storage for Zarr Arrays. So have added a workaround at the Zarr Array level to special case dict handling.

Though I'm not sure if we shouldn't just go for those two changes alone to fix this issue. Have pulled them out in PR ( #359 ) just in case.

Edit: To be clear, the dict-based chunk store workaround is no longer included in this PR.

@jakirkham
Copy link
Member Author

We can certainly go ahead with this as discussed. Though we may also want to consider adding a workaround for dict to ensure immutable values as done in PR ( #359 ) either in addition to this PR or instead of it.

@jakirkham jakirkham added this to the v2.3 milestone Dec 16, 2018
@jakirkham
Copy link
Member Author

Added to the v2.3 milestone just to track it. Happy to change this as needed.

Instead of using a Python `dict` as the `default` store for a Zarr
`Array`, use the `DictStore`. This ensures that all blobs will be
represented as `bytes` regardless of what the user provided as data.
Thus things like comparisons of stores will work well in the default
case.
As we are now using `DictStore` to back the `Array`, we can correctly
measure how much memory it is using. So update the examples in `info`
and the tutorial to show how much memory is being used. Also update the
store type listed in info as well.
@jakirkham jakirkham changed the title Make DictStore the default Array store WIP: Make DictStore the default Array store Feb 16, 2019
@jakirkham jakirkham removed this from the v2.3 milestone Feb 16, 2019
As we prefer to use the better behaved `DictStore`, raise an error if
`dict` is used. Should also help us smoke out where in our tests `dict`
is used and change it.
As we prefer to use the better behaved `DictStore`, raise an error if
`dict` is used. Should also help us smoke out where in our tests `dict`
is used and change it.
@jakirkham
Copy link
Member Author

Have pushed some changes that may be considered breaking. So have removed it from the milestone for now.

As `dict` stores are not supported in this changeset, there is no need
for this specific workaround for them. Given this go ahead drop this
workaround.
@joshmoore
Copy link
Member

@jakirkham @grlee77 : is it fair to say this has been superceded?

@jakirkham
Copy link
Member Author

jakirkham commented Nov 29, 2021

Yeah I think so. We can always revisit if needed (likely we would need a new PR at this point)

Edit: For context PR ( #789 ) addressed this.

@jakirkham jakirkham closed this Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants