Performance problems in tests (NIO2 and MINA, Linux only) #293

tomaswolf · 2022-12-17T17:29:27Z

Looking through the GitHub build logs, there seems to be a problem with two tests on Linux, with the NIO2 or MINA transport back-ends. They show very poor performance:

org.apache.sshd.sftp.client.SftpTransferTest

Build                   NIO2       MINA      Netty

ubuntu, java 8         ~300s       ~300s     ~70s
ubuntu, java 11        ~300s       ~300s     ~50s
ubuntu, java 17        ~300s       ~300s     ~50s

windows, java 8         ~62s        ~67s     ~66s
windows, java 11        ~49s        ~49s     ~43s
windows, java 17        ~43s        ~44s     ~45s

org.apache.sshd.scp.client.ScpTest

Build                   NIO2       MINA      Netty

ubuntu, java 8          ~19s        ~19s     ~11s
ubuntu, java 11         ~19s        ~19s     ~11s
ubuntu, java 17         ~19s        ~19s     ~11s

windows, java 8         ~12s        ~12s     ~12s
windows, java 11        ~12s        ~12s     ~12s
windows, java 17        ~12s        ~12s     ~12s

These are the only tests for which I see such discrepancies. It looks like the code is doing something that doesn't play well with NIO2 or MINA on Linux.

It's not a new problem either; I could find this pattern also in the build logs of 2022-10-25 (commit 7ad3ead) and also in even older builds. Though e.g. in a build from 2022-09-20 the differences are smaller (but still there; 120s instead of 300s, and 15s instead of 19s).

The windows times and the Linux/Netty times appear to be about normal and are what I also see for NIO2 or MINA when building locally on OS X: 12s for ScpTest and about a minute for the SftpTransferTest.

What is going on here?

The text was updated successfully, but these errors were encountered:

tomaswolf · 2022-12-17T19:32:06Z

Partial problem cause: in the SftpTransferTest, the problem is the SFTP buffer size of 64kB in testTransferIntegrityWithBufferLargerThanPacket. That is 1kB more than the maximum the server will ever return; its setting for the maximum chunk to return is 63kB. This means the client fetches the missing 1kB synchronously, which makes transfer times explode to 120s when running this test locally on a CentOS 7 virtual machine, as opposed to about 14 seconds with the default SFTP buffer size. If the buffer size is 62kB, this test runs locally for me also fast in 10 seconds.

I had pointed that out before in SSHD-1287. There is clearly room for improvements there.

This test is run twice during SftpTransferTest, and the overall run time of this test class is about 300s also locally on CentOS.

This does not explain yet:

why there is no such problem on Windows, nor with Netty on Linux. (Or at least, why the time doesn't increase that much.)
why the overall time for that test increased from 120s in September to 300s in December. Some bisecting is needed.
where the far less marked but still noticeable difference in ScpTest comes from.

tomaswolf · 2023-01-02T19:41:41Z

This is also reproducible for me in a CentOS 7 virtual machine running locally. From the logs I see that with NIO2 and MINA, there is a 40ms delay for getting these extra 1kB chunks, which isn't present in the Netty logs. But I have no idea yet where these 40ms come from.

Bisecting did not yield any insights.

If a client uses an SFTP buffer size greater than the maximum amount of data the server is willing to return for an individual read request, data will be delivered with "holes". For instance, if the server limit is 63kB, and the client uses a buffer size of 64kB, it will issue requests to read 64kb at offset 0, then 64kB at offset 65536, and so on. But it will get only 63kB at offset 0 and 63kB at offset 65536. When it notices that, it'll request the missing 1kB at offset 64512 synchronously. In a large file, this can lead to jumping back and forth in the file frequently if the client issues new requests after having read missing data, because the very next read again will be short, but the server may have already skipped forward again to satisfy the new requests. Avoid that by not issuing new requests until the client has caught up. That way, the server has to move backwards in the file far less often and can serve the data much faster and smoother. (An alternative or additional improvement in this area might be to make the server-side implementation smarter about re-ordering read requests. The current implementation just serves them as they come, but the SFTP draft RFCs explicitly allow some re-ordering. The server could use a priority queue ordered by offset, trying on _its_ side to avoid excessive back and forth. Some care would have to be taken to deal with overlapping read and write requests.) Bug: apache#293

tomaswolf · 2023-01-07T23:03:36Z

The problem in SFTP is caused by excessively jumping back and forth in the file on the server side. It manifests only with NIO2 and MINA but not with Netty because timing is different and the files end up being read more or less sequentially, or the backwards jumps are within some OS buffer size. With NIO2 and MINA, more initial read requests appear to be handled before a backwards jump occurs, which then leads to more and larger backwards jumps if one keeps issuing asynchronous requests.

I did not analyze why the problem did not occur on Windows or locally on OS X; perhaps these OSes handle this better than Linux, or the request timings and their handling are more like in the Netty/Linux case.

I still have no idea why the problem occurs from a particular GitHub CI build on. Perhaps something in the Linux container used to run the tests changed. In any case I have not found any particular commit from which on the performance would have dropped markedly on Linux; the problem was apparent in all runs in my local CentOS 7 VM, no matter how far back I went.

If a client uses an SFTP buffer size greater than the maximum amount of data the server is willing to return for an individual read request, data will be delivered with "holes". For instance, if the server limit is 63kB, and the client uses a buffer size of 64kB, it will issue requests to read 64kb at offset 0, then 64kB at offset 65536, and so on. But it will get only 63kB at offset 0 and 63kB at offset 65536. When it notices that, it'll request the missing 1kB at offset 64512 synchronously. In a large file, this can lead to jumping back and forth in the file frequently if the client issues new requests after having read missing data, because the very next read again will be short, but the server may have already skipped forward again to satisfy the new requests. Avoid that by not issuing new requests until the client has caught up. That way, the server has to move backwards in the file far less often and can serve the data much faster and smoother. Additionally, reduce the buffer size if gaps are detected to the maximum returned so far by the server. A gap means the server returned less data than the client requested, which can have three reasons: * EOF was reached. Reducing the buffer size is fine. * Server limit was exceeded. Reducing avoids future gaps. * (Transient) server error. Because of the third possibility, the buffer size is reduced only after the fourth gap. (An alternative or additional improvement in this area might be to make the server-side implementation smarter about re-ordering read requests. The current implementation just serves them as they come, but the SFTP draft RFCs explicitly allow some re-ordering. The server could use a priority queue ordered by offset, trying on _its_ side to avoid excessive back and forth. Some care would have to be taken to deal with overlapping read and write requests.) Bug: apache#293

tomaswolf changed the title ~~Performance problems in tests (NIO2, MINA, ubuntu)~~ Performance problems in tests (NIO2 and MINA, Linux only) Jan 2, 2023

tomaswolf mentioned this issue Jan 7, 2023

GH-293: Handle SFTP buffer sizes > server limit better #307

Merged

tomaswolf self-assigned this Jan 7, 2023

tomaswolf added this to the 2.9.3 milestone Jan 7, 2023

tomaswolf closed this as completed in #307 Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance problems in tests (NIO2 and MINA, Linux only) #293

Performance problems in tests (NIO2 and MINA, Linux only) #293

tomaswolf commented Dec 17, 2022

tomaswolf commented Dec 17, 2022

tomaswolf commented Jan 2, 2023

tomaswolf commented Jan 7, 2023

Performance problems in tests (NIO2 and MINA, Linux only) #293

Performance problems in tests (NIO2 and MINA, Linux only) #293

Comments

tomaswolf commented Dec 17, 2022

tomaswolf commented Dec 17, 2022

tomaswolf commented Jan 2, 2023

tomaswolf commented Jan 7, 2023