Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent speeds during SFTP tranfers #220

Open
nfebe opened this issue Aug 17, 2023 · 4 comments
Open

Inconsistent speeds during SFTP tranfers #220

nfebe opened this issue Aug 17, 2023 · 4 comments

Comments

@nfebe
Copy link
Contributor

nfebe commented Aug 17, 2023

Originally reported by @danfuzz

I tried to upload another bunch of files yesterday via SFTP and ran into some trouble which I think is worth
reporting:

During the initial part of the upload, my upload speed was super-slow (something like 20KB/sec)
including stalling out repeatedly. After half an hour or so it seemed to stop having trouble (and
accepted my traffic at more like 5MB/sec). Based on other activity on my local network, it seems likely
that the issue was likely on the Permanent side.

@slifty
Copy link
Contributor

slifty commented Sep 6, 2023

Our current theory is that this is related to a compounding of two known issues:

  1. Permanent does not expose the originally uploaded file associated with an archive record until AFTER the archive record is fully processed. SFTP requires access to the original file to do things like return file statistics and file data.

Ultimately this means that SFTP will become much slower than bandwidth allows for any file that takes a long time to process. The plan that the Permanent team is currently exploring is to update the back end processing code so that the original file is populated much earlier in the pipeline.

  1. Permanent has a known issue where processing of .doc and .pdf files can deadlock / slow down significantly due to limitations with libreoffice. It's been a while since this issue was reported but I would not be surprised if the very slow speeds were associated with a series of doc / pdf files being uploaded.

Both of these independently are issues in themselves, but I think the first one is the most direct issue for the sftp service, and resolving that should improve performance for ALL file types (and obviate the second issue for SFTP use cases).

@danfuzz
Copy link

danfuzz commented Sep 7, 2023

processing of .doc and .pdf files can deadlock / slow down significantly

FWIW, at least in my case this wasn't part of the observed issue. (I'm pretty sure all the files I was uploading at the time were plain text files and .tar.gz files.)

@slifty
Copy link
Contributor

slifty commented Sep 12, 2023

@danfuzz very good to know -- if you're OK sharing, do you have a sense of whether the .tar.gz files were particularly large?

I do still believe that the above processing sequence issue (even without the deadlock problem) is at least responsible for SOME of the slowness, even if there may also be other issues at play here!

Another thing to note is that unfortunately upload speeds will depend on API responsiveness in general and there is a path for optimization in that regard as well.

@kfogel kfogel mentioned this issue Sep 12, 2023
@danfuzz
Copy link

danfuzz commented Sep 12, 2023

@slifty IIRC the tarfiles in question ranged from something like 20 to 300 megs apiece.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants