Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UI] Issues in uploading large files and inconsistent UI behavior #54

Closed
hazratisulton opened this issue Jul 30, 2023 · 12 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@hazratisulton
Copy link

Steps:
Add a big file to the list.
Wait for the upload progress to reach 100%.

When upload a large file and upload progress reaches 100%, the window may remain displayed for a few more minutes (3+ minutes for 2 Gb file). I would like to better understand what is happening and how much longer to wait for the end:
image

When upload a large file interface is not blocked. It's possible to hidden "Upload file" window from previous screen and then how to figure out what is going on:
image

If then open the "Upload file" window again while in list displayed .tmp file, there is no progress bar or working status:
image

If wait a few minutes, then eventually the file will be uploaded:
image

Also on all screenshots can see the error "No train files..." left from the previous time. The interface has not been updated.

@mitya52
Copy link
Member

mitya52 commented Aug 30, 2023

we should hide tmp files

@gardner
Copy link

gardner commented Sep 5, 2023

I wonder if this loop needs to be there:

contents = await file.read(1024)

gardner added a commit to gardner/refact that referenced this issue Sep 5, 2023
This change resolves smallcloudai/issues/54

The code in `tab_upload.py` reads uploaded files 1024 bytes at a time. This is not efficient for large files on modern systems. 

`UploadFile.read()` maps to `RawIOBase.read()` which handles multiple reads to a stream internally. 

This change delegates stream io to the lower levels of the runtime.
@gardner
Copy link

gardner commented Sep 5, 2023

@hazratisulton have you tried "Download from URL"?

@hazratisulton
Copy link
Author

@gardner Your pr didn't fix the problem. Still have tmp file. Just need bigger file. Used not 2 Gb, but 10 Gb zip to reproduce the problem.

@k3KAW8Pnf7mkmdSMPHz27
Copy link

@hazratisulton more than 5 min?

@olegklimov
Copy link
Contributor

we should hide tmp files

Ha ha then there will be no way for user to delete it :D

@hazratisulton
Copy link
Author

hazratisulton commented Sep 26, 2023

@k3KAW8Pnf7mkmdSMPHz27 more than 20 min.

@valaises
Copy link
Member

valaises commented Oct 3, 2023

@hazratisulton Implemented in #136

@valaises valaises moved this from In Progress to On Review in Self-hosted / Enterprise Oct 3, 2023
@k3KAW8Pnf7mkmdSMPHz27
Copy link

k3KAW8Pnf7mkmdSMPHz27 commented Oct 4, 2023

@valaises if I am not mistaken there are three issues here,

  1. You are using FastAPI's built-in file uploading which is not ideal in this circumstance, but probably sufficient. This is why uploading from a URL should be faster. I can also submit a PR addressing this as long as someone can explain the purpose of the temporary files and why not onboard a lightweight DB to keep track of training sources instead. I spent some ~20 hours failing to implement an upload directory feature 😎
  2. There is only a progress bar while uploading the file. I only skimmed the PR, but I don't think this part is addressed. In practice, what you are doing is
    1. Uploading a file (while showing the progress)
    2. Copying the file from a tempfile (roughly at the same time as the upload, so the user probably see the progress bar)
    3. Unpacking the files. This shows no progress to the user, and can take longer than uploading the file.
  3. If the two earlier points are addressed, will the "Error:No train files have been provided for filtering" still appear?

@valaises I believe the PR notifies the user once the file is unpacked?

@hazratisulton please confirm if this is correct, or if I am hijacking your issue (in which case I apologize 😳 )

@valaises
Copy link
Member

valaises commented Oct 4, 2023

Hi, @k3KAW8Pnf7mkmdSMPHz27

i. Uploading from a URL is faster indeed, FastAPI's built-in seems less perfect but it will do. Temp files are needed to avoid file conflicts when different entities are manipulating the same files.

ii. In my environment, in case 1.4G zip file it took roughly 20s to move-unpack it up. IMO, no extra progress bars or detailed statuses for each step are needed. Spinner is just fine 👌

iii. "Error:No train files have been provided for filtering" is not gonna appear anymore as this section has moved to Finetune Tab 😀

@hazratisulton
Copy link
Author

@hazratisulton Implemented in #136
Interface is now locked until the file download has finished and we have no problem with tmp file. Uploaded 12Gb file in 8 minuts.
image
image

@klink klink added the bug Something isn't working label Oct 17, 2023
@hazratisulton
Copy link
Author

fixed in v1.1.0

@github-project-automation github-project-automation bot moved this from On Review to Released in Docker Nightly in Self-hosted / Enterprise Oct 27, 2023
@hazratisulton hazratisulton moved this from Released in Docker Nightly to Released in Docker V1.1 in Self-hosted / Enterprise Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Released in Docker V1.1
Development

Successfully merging a pull request may close this issue.

7 participants