-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File storage interface improvements #256
Draft
alexdutton
wants to merge
41
commits into
master
Choose a base branch
from
rfc-37
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
7dd691d
Entry points for storage backends
alexdutton 0bf2176
Add a class for computing checksums, counting bytes and reporting pro…
alexdutton 7096e97
Add a new extensible storage factory
alexdutton 9715111
_FilesRESTState.storage_factory deprecates old storage factory, can i…
alexdutton 7774b7d
Add some type annotations
alexdutton bca7ee2
Deprecate passing **kwargs to the storage factory
alexdutton f5244b3
Use the passthrough checksummer
alexdutton c39871a
Add a metaclass for FileStorage to return the backend name
alexdutton 37bd798
Simplify the pyfs storage backend implementation
alexdutton 568cf75
fixup! Add a new extensible storage factory
alexdutton 9bcd900
fixup! Add a new extensible storage factory
alexdutton 6e94128
intermediate
alexdutton 5899f79
Lots of things. Needs pulling apart before merging.
alexdutton 2e1a82a
Remove debugging print statements
alexdutton 282e618
Fix mypy errors
alexdutton c62cfa0
Remove StorageBackend metaclass
alexdutton 88e5fa8
Fix typo
alexdutton 705b95e
Reinstate _write_stream, and use context managers
alexdutton 93bd542
Improve checksumming
alexdutton a8fbd15
Add the beginning of documentation for writing storage backends
alexdutton 8a25500
Opening via the open() method should only be for reading
alexdutton 7820f84
Add delete() to list of methods in docs that need defining
alexdutton d4e660e
Simplify by removing now-unused PassthroughChecksum
alexdutton a508e3e
No longer need to get class object to look up storage name
alexdutton 429504f
docstrings for the storage factory
alexdutton 0aeefc4
pydocstyle fixes
alexdutton 8a55a0b
Remove unused method on base StorageBackend
alexdutton f03154a
list to tuple for __all__, to placate pydocstyle
alexdutton 501f35c
docstring fixes for pydocstyle
alexdutton 5ac96fe
Remove unused imports and unused Py3.7+ annotations future
alexdutton 5546899
PEP-8 style fixes. Mostly line-length-related.
alexdutton 6b33fd8
More PEP-8 fixes
alexdutton da397f8
The last PEP-8 warnings
alexdutton f56a7c3
Run isort, for CI happiness
alexdutton 3b0868f
Remove more extraneous code
alexdutton 985d64d
Add migration for backend_name
alexdutton 55d059f
Coalesce on `storage_backend`, not `backend_name` for model attr
alexdutton 0fede69
fixup! Add migration for backend_name
alexdutton dd1a557
More style fixes in the alembic migration
alexdutton 142660d
global: copyright headers update
lnielsen 4879f59
models: remove python 2 support
lnielsen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ Invenio-Files-REST. | |
installation | ||
configuration | ||
usage | ||
new-storage-backend | ||
|
||
|
||
API Reference | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
.. | ||
This file is part of Invenio. | ||
Copyright (C) 2020 Cottage Labs LLP | ||
|
||
Invenio is free software; you can redistribute it and/or modify it | ||
under the terms of the MIT License; see LICENSE file for more details. | ||
|
||
|
||
Developing a new storage backend | ||
================================ | ||
|
||
A storage backend should subclass ``invenio_files_rest.storage.StorageBackend`` and should minimally implement the | ||
``open()``, ``delete()``, ``_initialize(size)``, ``get_save_stream()`` and ``update_stream(seek)`` methods. You should | ||
register the backend with an entry point in your ``setup.py``: | ||
|
||
.. code-block:: python | ||
|
||
setup( | ||
..., | ||
entry_points={ | ||
..., | ||
'invenio_files_rest.storage': [ | ||
'mybackend = mypackage.storage:MyStorageBackend', | ||
], | ||
... | ||
}, | ||
... | ||
) | ||
|
||
Implementation | ||
-------------- | ||
|
||
The base class handles reporting progress, file size limits and checksumming. | ||
|
||
Here's an example implementation of a storage backend that stores files remotely over HTTP with no authentication. | ||
|
||
.. code-block:: python | ||
|
||
import contextlib | ||
import httplib | ||
import urllib | ||
import urllib.parse | ||
|
||
from invenio_files_rest.storage import StorageBackend | ||
|
||
|
||
class StorageBackend(StorageBackend): | ||
def open(self): | ||
return contextlib.closing( | ||
urllib.urlopen('GET', self.uri) | ||
) | ||
|
||
def _initialize(self, size=0): | ||
# Allocate space for the file. You can use `self.uri` as a suggested location, or return | ||
# a new location as e.g. `{"uri": the_new_uri}`. | ||
urllib.urlopen('POST', self.uri, headers={'X-Expected-Size': str(size)}) | ||
return {} | ||
|
||
@contextlib.contextmanager | ||
def get_save_stream(self): | ||
# This should be a context manager (i.e. something that can be used in a `with` statement) | ||
# which closes the file when exiting the context manager and performs any clear-up if | ||
# an error occurs. | ||
parsed_uri = urllib.parse.urlparse(self.uri) | ||
# Assume the URI is HTTP, not HTTPS, and doesn't have a port or querystring | ||
conn = httplib.HTTPConnection(parsed_uri.netloc) | ||
|
||
conn.putrequest('PUT', parsed_uri.path) | ||
conn.endheaders() | ||
|
||
# HTTPConnections have a `send` method, whereas a file-like object should have `write` | ||
conn.write = conn.send | ||
|
||
try: | ||
yield conn.send | ||
response = conn.getresponse() | ||
if not 200 <= response.status < 300: | ||
raise IOError("HTTP error") | ||
finally: | ||
conn.close() | ||
|
||
|
||
Checksumming | ||
------------ | ||
|
||
The base class performs checksumming by default, using the ``checksum_hash_name`` class or instance attribute as | ||
the hashlib hashing function to use. If your underlying storage system provides checksumming functionality you can set | ||
this to ``None`` and override ``checksum()``: | ||
|
||
.. code-block:: python | ||
|
||
class RemoteChecksumStorageBackend(StorageBackend): | ||
checksum_hash_name = None | ||
|
||
def checksum(self, chunk_size=None): | ||
checksum = urllib.urlopen(self.uri + '?checksum=sha256').read() | ||
return f'sha256:{checksum}' | ||
|
41 changes: 41 additions & 0 deletions
41
invenio_files_rest/alembic/0999e27defd5_add_backend_name.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# | ||
# This file is part of Invenio. | ||
# Copyright (C) 2020 Cottage Labs LLP | ||
# | ||
# Invenio is free software; you can redistribute it and/or modify it | ||
# under the terms of the MIT License; see LICENSE file for more details. | ||
|
||
"""Add backend_name.""" | ||
|
||
import sqlalchemy as sa | ||
from alembic import op | ||
|
||
# revision identifiers, used by Alembic. | ||
revision = '0999e27defd5' | ||
down_revision = '8ae99b034410' | ||
branch_labels = () | ||
depends_on = None | ||
|
||
|
||
def upgrade(): | ||
"""Upgrade database.""" | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.add_column( | ||
'files_files', | ||
sa.Column('storage_backend', | ||
sa.String(length=32), | ||
nullable=True)) | ||
op.add_column( | ||
'files_location', | ||
sa.Column('storage_backend', | ||
sa.String(length=32), | ||
nullable=True)) | ||
# ### end Alembic commands ### | ||
|
||
|
||
def downgrade(): | ||
"""Downgrade database.""" | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.drop_column('files_location', 'storage_backend') | ||
op.drop_column('files_files', 'storage_backend') | ||
# ### end Alembic commands ### |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I see some potential troubles here.
How are users expected to provide storage backend?
FILES_REST_STORAGE_BACKENDS
is overwritten, then below code will still execute causing all storage backends to be loaded, even if we don't want them.Thus in both cases, I think it makes sense to move the entry point iteration to the Flask extension to avoid unnecessarily loading storage backends that we don't need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was supporting both so that there was an obvious default behaviour (entry points), but that they could override this if desired. The "overriding" use case was that they wanted to change the behaviour of a backend they'd already used, but which they don't maintain (e.g. one in invenio-files-rest, or invenio-s3). In this case defining a new entrypoint with the same name wouldn't work, so they'd have to define the name→backend mapping explicitly as config.
However, yes, it's not great to do the entrypoint dereferencing here regardless of whether it's overridden. If you're happy with the rationale above I can move this to the extension loading code.
I should also write documentation for this.