-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support cloud storage #2620
Support cloud storage #2620
Conversation
I'm really interested by this feature, do you have an idea of when it will be integrated? By the way, I'm using MinIO; that is a S3-compatible storage, i imagine it should be used as for AWS S3 therefore? |
I will continue to develop this functionality in the near future. |
5d8be0a
to
8af9a39
Compare
ecd0a0a
to
91c0e42
Compare
cvat/apps/engine/views.py
Outdated
tmp_manifest = NamedTemporaryFile(mode='w+b', suffix='cvat', prefix='manifest') | ||
storage.download_file(manifest_path, tmp_manifest.name) | ||
manifest = ImageManifestManager(tmp_manifest.name) | ||
manifest.init_index() | ||
manifest_files = manifest.data | ||
tmp_manifest.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use context manager here
cvat/apps/engine/cloud_provider.py
Outdated
|
||
def is_exist(self): | ||
try: | ||
self._container_client.create_container() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Marishka17 , is it the only way to check that? Looks unusual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContainerClient
in version 12.6.0 did not nave a method exists
and I have seen such practices in other implementations. But now in 12.8.1 version it appeared and I changed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Marishka17 , in general it looks very well. Need to check that we don't copy all data to the upload directory from a cloud drive. It is the only major question which I have. Other comments are minor.
@Marishka17 , could you please add documentation how to add cloud storages? There are some problems with error messages. They provide zero information from my perspective.
The request works, but
Produce
The request produces:
|
If I try to run Also I believe that a manifest file should be specified when we create a cloud storage. What do you think? I don't think that it is the right idea to specify it when you want to list content of the attached cloud drive. |
@Marishka17 , |
In order to create S3 storage need to indicate:
|
@nmanovic , is that better?
|
@Marishka17 , let's discuss how it is going to look like for users. There are multiple ways to implement the feature. If you can provide good UX in UI, I'm totally fine with any approach. In my proposed approach we can clone a connection and change something (e.g. manifest file) during the cloning process. Thus you can name the connection Cityscapes and it will correspond to one Cityscapes dataset. |
It is much better. |
@Marishka17 could you please add a note about that into swagger docs? |
cvat/apps/engine/models.py
Outdated
# The typical token size is less than 4096 bytes, but that can vary. | ||
provider_type = models.CharField(max_length=20, choices=CloudProviderChoice.choices()) | ||
resource = models.CharField(max_length=63) | ||
display_name = models.CharField(max_length=63, unique=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Marishka17 @nmanovic why make this field unique? I think it might be unique in user space, but it should not be globally unique. For example, on Github, I can create a repo named cvat
in my user space, but on cvat.org I won't be able to create a cloud storage named aws
if someone has already created it. Can a cloud resources be shared between users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a cloud resources be shared between users?
The created storage will be available to the user who created it and the administrator
specific_attributes
should contain structure likekey1=value1&key2=value2
.
In real case e.g"specific_attributes": "range=eu-west-2"
@Marishka17 could you please add a note about that into swagger docs?
possible solutions:
- Use
swagger_schema_fields = {
'type': openapi.TYPE_OBJECT,
'title': 'Cloud Storage',
'properties': {
'specific_attributes': openapi.Schema(
title='Specific attributes',
type=openapi.TYPE_STRING,
description='structure like key1=value1&key2=value2\n'
'supported: range=aws_range',
),
},
# "required": [...],
}
in CloudStorageSerializer Meta class need to redefine all descriptions that should be generated automatically.
- move
specific_attributes
into separate class
class SpecificAttributes(serializers.Field):
def to_representation(self, value):
pass
def to_internal_value(self, value):
pass
class Meta:
swagger_schema_fields = {
'type': openapi.TYPE_STRING,
'title': 'Specific attributes',
'description': 'structure like key1=value1&key2=value2\n'
'supported: range=aws_range',
'maxLength': '50',
}
IMHO, it should not be separated into a separate class just because of the addition to the documentation.
- Use fieldInspector, I chose this solution.
This reverts commit 142dea7.
New PR without multiple dummy changes from pretifier: #3326 |
Motivation and context
Support work with remote cloud storage without copying data to CVAT
Related issue: #863
How to generate temporary credentials (S3) e.g.
How has this been tested?
Manually with swagger
REST API
GET /api/v1/cloudstorages
POST /api/v1/cloudstorages
GET /api/v1/cloudstorages/{id}
GET /api/v1/cloudstorages/{id}/content
PATCH /api/v1/cloudstorages/{id}
Supported cloud providers:
Iterations:
Checklist
develop
branch- [ ] I have increased versions of npm packages if it is necessary (cvat-canvas,cvat-core, cvat-data and cvat-ui)
License
Feel free to contact the maintainers if that's a concern.