-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete downloads #424
Comments
Queries: Is it returning an |
Yes, |
Hm… this is proving pretty hard to pin down what might be going wrong with Clearly some number of following bytes are not being committed, this can be clearly seen, because the two mistaken file size examples are at 32768 byte boundaries… |
Sounds right, other examples like 1343488, 1638400 and 786432 also fulfil that. |
I’m thinking we might want to try a cherry pick that covers the unexpected channel close case. I’m entirely unsure how that could ever trigger, but it might be the problem, as that is the only other way we can get a This could also explain the observed behavior, some number of 32 KiB writes work, then we receive on a closed channel, and stop all transfers. |
@puellanivis thank you. So if I want try to replicate this issue it should happen only for unexpected transfer errors, right? |
@drakkan That’s the thing, this really shouldn’t happen for unexpected transfer errors either. Those should all percolate some non-EOF error up the line as well? 🤔 |
I tryed to replicate this issue with no luck using something like this:
@mrwonko, can you please confirm that #425 fixes this issue in your case? @puellanivis I tryed this over a VPN connection, to trigger the error I tryed to disconnect the VPN, in this case more than 15 minutes are needed before the connection lost error is triggered
what do you think about? Should we add a context or such? thank you |
I will give #425 a try next week and let you know if it helps. |
I've deployed #425 on our preproduction system a couple of hours ago and there was already the first partial download, 98304 bytes instead of 240430. I'll keep it there until tomorrow to see if it happens again, but then I'll probably roll back, it doesn't seem to work. |
There have been about a dozen failed downloads at this point, #425 doesn't seem to fix our issue. |
@mrwonko can you please provide a reproducer? I tryed the above code with no luck. I'm not very familiar with client side code but if I have an easy way to reproduce the issue I'll try to fix it, thank you |
The only think I can think of now is the possibility that there are writes and reads happening to the remote file at the same time. So, when the read tries to read a value at offset XYZ, it’s met with an |
Going through with a fine-toothed comb again, I’m seeing numerous small issues here and there. I’ll be making a PR to address them, and ensure that Since proper EOFs in the SFTP library should only ever come via an |
@puellanivis thank for your patch |
I'll give that new patch a try. I've not yet been able to reproduce this locally due to unrelated issues, so instead I'm testing using our normal downloaders on our preproduction cluster, which reliably reproduces it. That said, the snippet you posted should theoretically be sufficient for a reproduction. |
I still get partial downloads with #429. |
🙃 wth? |
What are the architectures and programs running on both sides of this? Like, clearly the client in this situation is from our library, but what kind of server is it connecting to? Is that server running off of our package as well? |
I know this might be a long shot, but do you still get incomplete downloads with #435 ? |
Our client runs in an x86_64 docker container. All I know about the server is that it runs on Windows, and I'm pretty sure it doesn't use this library. Since we don't run in 32 bit, I see no point in trying #435? |
So, while the problem was manifest in 32-bit, but there were still other possible edge conditions that could have manifest the same behavior. I’m honestly not sure and I’m grasping at straws here. |
As far as I understand, the server is running JSCAPE and accesses the files it serves via some network file system. |
@carsten-luckmann @mrwonko we tried to reproduce the issue with no luck, you have two choices:
Thank you! |
I might be seeing this problem too. Using
There are no errors (so I assume |
😱 I think I might see what’s going on with your situation @j0hnsmith . While using concurrent reads, if we request the end-of-file read, the end server might trigger the “download is complete” functionality, and delete the file from its side and maybe even This would mean any later reading of the file at any point in the file would end up with an EOF since it is now beyond the length of the file. Causing a a premature and silent EOF to happen as some odd 32k boundary less than the length of the whole file. I had considered needing to have the writes well-ordered, but I had not anticipated a need to ensure monotonously increasing read offsets in |
🤔 I’m thinking about if I should work on a patch to get well-ordered monotonously increasing reads into That other PR doesn’t really touch any of the |
#436 can people try this PR? And see if it fixes the issue? |
hopefully, fixed in v1.13.1 |
This seems to be still the case in v1.13.6. fp, err := sftpClient.Open("somepath")
...
bs, err := io.ReadAll(fp)
// vs
var buf bytes.Buffer
_, err = io.Copy(&buf, fp) |
Hm… trippy. So, We’ll probably want to do some sort of check that |
Actually reopening this issue. |
I tried to break on The EOF "error" is generated by |
If the |
Ok, we just got hit by this issue when transitioning from a stock openssh sftp server to one from GoAnywhere. We have no issues with the stock, but we do have issues that manifest exactly as described here when using the GoAnywhere server. We are going to try to setup something to trace the sftp communication so that we can compare and contrast and maybe help solve the problem. |
I also have the same issue that @Coffeeri was describing. Specifically, My use case includes reading the file twice:
|
What version are you using? Your use case should work otherwise. Also, you can use |
Hi @puellanivis ,
Yes, sorry, I meant to say that I was doing two separate reads because the file doesn't fit into memory and that the first read (the md5sum calculation) works perfectly everytime while the second read (the s3 upload) returns different contents each time until it eventually succeeds after several retries. I guess I'm currently on version 1.3.1. Haven't tried the latest version. Do you see something that might explain it for the older version? Thanks. |
Hi varunbpatil, We’ve fixed a few thing that could possibly be fixed in one of the later versions. I would recommend trying the newest version (there isn’t any reason to not update, it’s safe), and see if the issue persists. |
Adding in here that we ran into this, bumped to version v1.13.7, but has the same issue. It silently truncates the file at the 1638400 byte boundary. Ultimately, we had to change out ReadAll() calls to Copy() and that solved the issue. |
Since upgrading to 1.13 the library will often report EOF before a file opened with
Client.Open()
is downloaded completely. For example a 634618 byte file has stopped downloading after just 65536 bytes, and a 2367599 byte file after 2031616.As a workaround I've added a call to
File.Stat().Size()
, which returns the correct file size, and compare that with the downloaded size to check for success.The text was updated successfully, but these errors were encountered: