-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Datastore abstraction for Metaflow #580
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit contains only the new datastore code and none of the backend implementations. The datastore is now split into different files: - flow_datastore.py contains the top-level FlowDataStore implementation - task_datastore.py contains the task-level datastore implementation - content_addressed_store.py contains the underlying content addressed store used by both previous data-stores.
The local backend is used to save and load local files.
Datatools will now cache the s3 client it uses for single operations resulting in faster operation times. Another optimization (non advertised) is that datatools can now take an IOBase directly to avoid having an additional copy. Finally, __del__ now performs a close so the S3 datatool can be used as a regular object as opposed to just within a context.
This backend allows the datastore to interface with S3.
One tiny semantic change in the way a Tar file is read (using the recommended open method instead of TarFile object)
- support for range queries (in get and get_many) - support for content-type and user metadata in put, put_many and put_files (metadata can also be retrieved using any of the get calls) - support for info and info_many to retrieve information about a file without fetching it.
Instead of encoding needed information directly in the file, we now encode this information as file metadata leveraging the support for metadata for the S3 datatools and implementing support for it in the local filesystem (creating a separate file)
…-datastore-with-meta
* Added preliminary documentation on the datastore * Update datastore.md Update with new name for the backend.
* convergence fixes * fixes * gzip ts changes * fix logs subcommand * typo * Address comments; add support for var_transform in bash_capture_logs * Fix local metadata sync * Forgot to remove duplicate sync_metadata Co-authored-by: Romain Cledat <rcledat@netflix.com> Co-authored-by: Romain <romain-intel@users.noreply.github.com>
This was referenced Sep 29, 2021
This was referenced Mar 28, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Testing in progress...
Detailed notes to follow.