-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problems in tests (NIO2 and MINA, Linux only) #293
Comments
Partial problem cause: in the SftpTransferTest, the problem is the SFTP buffer size of 64kB in I had pointed that out before in SSHD-1287. There is clearly room for improvements there. This test is run twice during This does not explain yet:
|
This is also reproducible for me in a CentOS 7 virtual machine running locally. From the logs I see that with NIO2 and MINA, there is a 40ms delay for getting these extra 1kB chunks, which isn't present in the Netty logs. But I have no idea yet where these 40ms come from. Bisecting did not yield any insights. |
If a client uses an SFTP buffer size greater than the maximum amount of data the server is willing to return for an individual read request, data will be delivered with "holes". For instance, if the server limit is 63kB, and the client uses a buffer size of 64kB, it will issue requests to read 64kb at offset 0, then 64kB at offset 65536, and so on. But it will get only 63kB at offset 0 and 63kB at offset 65536. When it notices that, it'll request the missing 1kB at offset 64512 synchronously. In a large file, this can lead to jumping back and forth in the file frequently if the client issues new requests after having read missing data, because the very next read again will be short, but the server may have already skipped forward again to satisfy the new requests. Avoid that by not issuing new requests until the client has caught up. That way, the server has to move backwards in the file far less often and can serve the data much faster and smoother. (An alternative or additional improvement in this area might be to make the server-side implementation smarter about re-ordering read requests. The current implementation just serves them as they come, but the SFTP draft RFCs explicitly allow some re-ordering. The server could use a priority queue ordered by offset, trying on _its_ side to avoid excessive back and forth. Some care would have to be taken to deal with overlapping read and write requests.) Bug: apache#293
If a client uses an SFTP buffer size greater than the maximum amount of data the server is willing to return for an individual read request, data will be delivered with "holes". For instance, if the server limit is 63kB, and the client uses a buffer size of 64kB, it will issue requests to read 64kb at offset 0, then 64kB at offset 65536, and so on. But it will get only 63kB at offset 0 and 63kB at offset 65536. When it notices that, it'll request the missing 1kB at offset 64512 synchronously. In a large file, this can lead to jumping back and forth in the file frequently if the client issues new requests after having read missing data, because the very next read again will be short, but the server may have already skipped forward again to satisfy the new requests. Avoid that by not issuing new requests until the client has caught up. That way, the server has to move backwards in the file far less often and can serve the data much faster and smoother. (An alternative or additional improvement in this area might be to make the server-side implementation smarter about re-ordering read requests. The current implementation just serves them as they come, but the SFTP draft RFCs explicitly allow some re-ordering. The server could use a priority queue ordered by offset, trying on _its_ side to avoid excessive back and forth. Some care would have to be taken to deal with overlapping read and write requests.) Bug: apache#293
The problem in SFTP is caused by excessively jumping back and forth in the file on the server side. It manifests only with NIO2 and MINA but not with Netty because timing is different and the files end up being read more or less sequentially, or the backwards jumps are within some OS buffer size. With NIO2 and MINA, more initial read requests appear to be handled before a backwards jump occurs, which then leads to more and larger backwards jumps if one keeps issuing asynchronous requests. I did not analyze why the problem did not occur on Windows or locally on OS X; perhaps these OSes handle this better than Linux, or the request timings and their handling are more like in the Netty/Linux case. I still have no idea why the problem occurs from a particular GitHub CI build on. Perhaps something in the Linux container used to run the tests changed. In any case I have not found any particular commit from which on the performance would have dropped markedly on Linux; the problem was apparent in all runs in my local CentOS 7 VM, no matter how far back I went. |
If a client uses an SFTP buffer size greater than the maximum amount of data the server is willing to return for an individual read request, data will be delivered with "holes". For instance, if the server limit is 63kB, and the client uses a buffer size of 64kB, it will issue requests to read 64kb at offset 0, then 64kB at offset 65536, and so on. But it will get only 63kB at offset 0 and 63kB at offset 65536. When it notices that, it'll request the missing 1kB at offset 64512 synchronously. In a large file, this can lead to jumping back and forth in the file frequently if the client issues new requests after having read missing data, because the very next read again will be short, but the server may have already skipped forward again to satisfy the new requests. Avoid that by not issuing new requests until the client has caught up. That way, the server has to move backwards in the file far less often and can serve the data much faster and smoother. Additionally, reduce the buffer size if gaps are detected to the maximum returned so far by the server. A gap means the server returned less data than the client requested, which can have three reasons: * EOF was reached. Reducing the buffer size is fine. * Server limit was exceeded. Reducing avoids future gaps. * (Transient) server error. Because of the third possibility, the buffer size is reduced only after the fourth gap. (An alternative or additional improvement in this area might be to make the server-side implementation smarter about re-ordering read requests. The current implementation just serves them as they come, but the SFTP draft RFCs explicitly allow some re-ordering. The server could use a priority queue ordered by offset, trying on _its_ side to avoid excessive back and forth. Some care would have to be taken to deal with overlapping read and write requests.) Bug: apache#293
Looking through the GitHub build logs, there seems to be a problem with two tests on Linux, with the NIO2 or MINA transport back-ends. They show very poor performance:
org.apache.sshd.sftp.client.SftpTransferTest
org.apache.sshd.scp.client.ScpTest
These are the only tests for which I see such discrepancies. It looks like the code is doing something that doesn't play well with NIO2 or MINA on Linux.
It's not a new problem either; I could find this pattern also in the build logs of 2022-10-25 (commit 7ad3ead) and also in even older builds. Though e.g. in a build from 2022-09-20 the differences are smaller (but still there; 120s instead of 300s, and 15s instead of 19s).
The windows times and the Linux/Netty times appear to be about normal and are what I also see for NIO2 or MINA when building locally on OS X: 12s for ScpTest and about a minute for the SftpTransferTest.
What is going on here?
The text was updated successfully, but these errors were encountered: