Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images uploaded via the SDK are twice as large as those uploaded via the web interface #6878

Closed
2 tasks done
jcorrochanoj opened this issue Sep 19, 2023 · 13 comments · Fixed by #6952
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@jcorrochanoj
Copy link

My actions before raising this issue

Steps to Reproduce (for bugs)

  1. Select some images to be uploaded via web interface and also SDK
  2. Using the web interface, create a task with those images
  3. Using the SDK, create a task and upload those images
  4. Compare the size of the images inside the docker volume

Current Behaviour

I am creating tasks and uploading images programmatically, but I realized that a project I expected to be 20 GB in size was finally 40 GB. Initially, I thought it might be because of parameters like zip_chunks or cache, but it's not.

I selected three images of 77.6MB, 56.9MB, and 42.8MB, created a CVAT task from the web interface, and uploaded these images. I have done the same process but with the SDK, and as a result, I can see that the images uploaded with the SDK occupy twice as much space in the docker volume.

This is the configuration used in the web interface:
image

This is the configuration used in SDK:

data_params = {"image_quality": 70,
                "use_zip_chunks": True,
                "use_cache": True}

This is the comparison of size between task created in web interface (253) and task created using the SDK (255):
image

Is it a bug or am I missing a parameter?

Your Environment

  • CVAT VERSION: 2.7.0
  • CVAT SDK: 2.7.0
  • Docker version docker version (e.g. Docker 17.0.05): 20.10.25
  • Operating System and version (e.g. Linux, Windows, MacOS): Amazon Linux
@shinbehavior
Copy link

Try to turn off the Prefer zip chunks toggle.

@jcorrochanoj
Copy link
Author

Try to turn off the Prefer zip chunks toggle.

My problem (double image size) occurs when uploading images with the SDK, not with the web interface.

I have tried to upload the images (via sdk, api) without the zip chunks option and the problem persists.

@dm0
Copy link

dm0 commented Sep 20, 2023

Also observing this issue but with manifest files.

I found that this happens if the file size is above 2.5Mb. In this case django starts to use TemporaryUploadedFile instead of InMemoryUploadedFile. And in this case the file is written (in append mode) twice.

It appears that the file is written to disk first time by ClientFile model and the second time by append_files (UploadMixin).

When the size is below 2.5Mb (FILE_UPLOAD_MAX_MEMORY_SIZE Django option) the append_files actually writes 0 bytes (to the already created file). If the size is bigger then append_files writes full file size. Probably in case of memory file the file pointer is preserved at the end of the "file" (that is already read by that moment) and is reset to the beginning in case of TemporaryUploadedFile.

The workaround I'm going to use now is to increase the FILE_UPLOAD_MAX_MEMORY_SIZE (haven't tried yet).

@dm0
Copy link

dm0 commented Sep 20, 2023

Increasing the FILE_UPLOAD_MAX_MEMORY_SIZE value works for me.

@jcorrochanoj
Copy link
Author

jcorrochanoj commented Sep 20, 2023

Increasing the FILE_UPLOAD_MAX_MEMORY_SIZE value works for me.

Thank you very much for the answer, but where can I find this variable?

@dm0
Copy link

dm0 commented Sep 20, 2023

This variable is part of Django setting: https://docs.djangoproject.com/en/4.2/ref/settings/. I used approach described here to overwrite it: https://opencv.github.io/cvat/docs/administration/advanced/ldap/

But used settings.py with this contents:

from cvat.settings.production import *

FILE_UPLOAD_MAX_MEMORY_SIZE = 100 * 1024 * 1024

The docker-compose.override.yml is the same as in the documentation referenced above:

services:
  cvat_server:
    environment:
      DJANGO_SETTINGS_MODULE: settings
    volumes:
      - ./settings.py:/home/django/settings.py:ro

@jcorrochanoj
Copy link
Author

jcorrochanoj commented Sep 20, 2023

Would it be the same to add it to the current DJANGO settings file in CVAT?
This file is cvat/settings/production.py, and I have tried to add this line:

FILE_UPLOAD_MAX_MEMORY_SIZE = 200 * 1024 * 1024

But I still have the same problem. Anyway, I don't understand why it would be something about this configuration, as I only see the problem if I upload images through the SDK, but not through the web interface.

@dm0
Copy link

dm0 commented Sep 20, 2023

Would it be the same to add it to the current DJANGO settings file in CVAT?
This file is cvat/settings/production.py, and I have tried to add this line:

Yes, should be the same.

I'm not sure if the issue is the same with images. I observed double file size for manifest.json uploaded via SDK (via cvat-cli to be precise).

@jcorrochanoj
Copy link
Author

OK, so that solution didn't work for me. Thank you very much for your help!
I hope I can solve it because it's quite a serious problem.

@jcorrochanoj
Copy link
Author

More information and experiments:

  • I have created a task using the SDK using a 2.4 MB image. When checking the cvat data volume, the image size is still 2.4MB.
    image

  • I have created a task using the SDK using the initial test images and this new 2.4 MB image. Now, the image size is doubled.
    image

This might indicate that it must be related not to the size of the file itself, but to the size of the set of uploaded images.

I have been doing a lot of tests and, as @dm0 said, this is happening when exceeding 2.5 MB. That is, if the file or set of files is less than 2.5MB, there is no problem. If it exceeds 2.5MB, all the files become duplicated in size.

By the way, @dm0 maybe your solution isn't working for me because I'm not doing it right. After editing the file, what commands should I run? Thank you very much for your help.

@dm0
Copy link

dm0 commented Sep 21, 2023

@jcorrochanoj ,

By the way, @dm0 maybe your solution isn't working for me because I'm not doing it right. After editing the file, what commands should I run? Thank you very much for your help.

It depends on how you run CVAT and edit the file. If you'r using docker compose and edit the file in the container then I believe it should be enough to restart the cvat_server container.

If just edit the settings file in your local copy of the CVAT repository and then use docker compose up — I think this will not work. You will need to build your own docker image (or edit the file inside the container).

By the way, I checked with a single 23Mb image this way:

cvat-cli create "test" --project_id 1  local 23m.png

And the file size appears to be duplicated (with the CVAT 2.7.1 without additional setup):

# ls -lh /home/django/data/data/56/raw/
total 45M
-rw-r--r-- 1 django django 45M Sep 21 08:45 23m.png

Assuming CVAT is not running (docker compose down). If I do the following (in the local CVAT repository root):

  1. Create settings.py with the contents shown above.
  2. Create docker-compose.override.yml with the contents shown above above.
  3. Run CVAT_VERSION=v2.7.1 docker compose up -d.
  4. Create a new task using cvat-cli, as before.

Then the file size is expected:

# ls -lh /home/django/data/data/57/raw/
total 23M
-rw-r--r-- 1 django django 23M Sep 21 08:51 23m.png

@jcorrochanoj
Copy link
Author

@dm0 Following your steps my problem is solved! Thank you very much for your help.

Anyway, I think this is a bug and must be solved, so I don't close the issue waiting for a developer to see it.

@azhavoro
Copy link
Contributor

Thanks for the report, we will take a look.

@azhavoro azhavoro self-assigned this Sep 21, 2023
@azhavoro azhavoro added the bug Something isn't working label Sep 21, 2023
@azhavoro azhavoro assigned zhiltsov-max and unassigned azhavoro Oct 14, 2023
SpecLad pushed a commit that referenced this issue Oct 17, 2023
…ltiple requests (#6952)

Fixes #6878 

In the case of big files (>2.5 MB by default), the uploaded files could
be write-appended twice,
leading to bigger raw file sizes than expected. This PR fixes the
behavior by excluding repetitive
writes where it was not supposed.

- Fixed double append-writing of the uploaded files when Upload-Multiple
  requests are used
- Fixed potential DB - disk inconsistencies in the case of upload errors
- Added tests
mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Oct 25, 2023
…ltiple requests (cvat-ai#6952)

Fixes cvat-ai#6878 

In the case of big files (>2.5 MB by default), the uploaded files could
be write-appended twice,
leading to bigger raw file sizes than expected. This PR fixes the
behavior by excluding repetitive
writes where it was not supposed.

- Fixed double append-writing of the uploaded files when Upload-Multiple
  requests are used
- Fixed potential DB - disk inconsistencies in the case of upload errors
- Added tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants