Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems in tests (NIO2 and MINA, Linux only) #293

Closed
tomaswolf opened this issue Dec 17, 2022 · 3 comments · Fixed by #307
Closed

Performance problems in tests (NIO2 and MINA, Linux only) #293

tomaswolf opened this issue Dec 17, 2022 · 3 comments · Fixed by #307
Assignees
Milestone

Comments

@tomaswolf
Copy link
Member

Looking through the GitHub build logs, there seems to be a problem with two tests on Linux, with the NIO2 or MINA transport back-ends. They show very poor performance:

org.apache.sshd.sftp.client.SftpTransferTest

Build                   NIO2       MINA      Netty

ubuntu, java 8         ~300s       ~300s     ~70s
ubuntu, java 11        ~300s       ~300s     ~50s
ubuntu, java 17        ~300s       ~300s     ~50s

windows, java 8         ~62s        ~67s     ~66s
windows, java 11        ~49s        ~49s     ~43s
windows, java 17        ~43s        ~44s     ~45s

org.apache.sshd.scp.client.ScpTest

Build                   NIO2       MINA      Netty

ubuntu, java 8          ~19s        ~19s     ~11s
ubuntu, java 11         ~19s        ~19s     ~11s
ubuntu, java 17         ~19s        ~19s     ~11s

windows, java 8         ~12s        ~12s     ~12s
windows, java 11        ~12s        ~12s     ~12s
windows, java 17        ~12s        ~12s     ~12s

These are the only tests for which I see such discrepancies. It looks like the code is doing something that doesn't play well with NIO2 or MINA on Linux.

It's not a new problem either; I could find this pattern also in the build logs of 2022-10-25 (commit 7ad3ead) and also in even older builds. Though e.g. in a build from 2022-09-20 the differences are smaller (but still there; 120s instead of 300s, and 15s instead of 19s).

The windows times and the Linux/Netty times appear to be about normal and are what I also see for NIO2 or MINA when building locally on OS X: 12s for ScpTest and about a minute for the SftpTransferTest.

What is going on here?

@tomaswolf
Copy link
Member Author

Partial problem cause: in the SftpTransferTest, the problem is the SFTP buffer size of 64kB in testTransferIntegrityWithBufferLargerThanPacket. That is 1kB more than the maximum the server will ever return; its setting for the maximum chunk to return is 63kB. This means the client fetches the missing 1kB synchronously, which makes transfer times explode to 120s when running this test locally on a CentOS 7 virtual machine, as opposed to about 14 seconds with the default SFTP buffer size. If the buffer size is 62kB, this test runs locally for me also fast in 10 seconds.

I had pointed that out before in SSHD-1287. There is clearly room for improvements there.

This test is run twice during SftpTransferTest, and the overall run time of this test class is about 300s also locally on CentOS.

This does not explain yet:

  • why there is no such problem on Windows, nor with Netty on Linux. (Or at least, why the time doesn't increase that much.)
  • why the overall time for that test increased from 120s in September to 300s in December. Some bisecting is needed.
  • where the far less marked but still noticeable difference in ScpTest comes from.

@tomaswolf tomaswolf changed the title Performance problems in tests (NIO2, MINA, ubuntu) Performance problems in tests (NIO2 and MINA, Linux only) Jan 2, 2023
@tomaswolf
Copy link
Member Author

This is also reproducible for me in a CentOS 7 virtual machine running locally. From the logs I see that with NIO2 and MINA, there is a 40ms delay for getting these extra 1kB chunks, which isn't present in the Netty logs. But I have no idea yet where these 40ms come from.

Bisecting did not yield any insights.

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jan 7, 2023
If a client uses an SFTP buffer size greater than the maximum amount
of data the server is willing to return for an individual read request,
data will be delivered with "holes". For instance, if the server limit
is 63kB, and the client uses a buffer size of 64kB, it will issue
requests to read 64kb at offset 0, then 64kB at offset 65536, and
so on. But it will get only 63kB at offset 0 and 63kB at offset 65536.
When it notices that, it'll request the missing 1kB at offset 64512
synchronously.

In a large file, this can lead to jumping back and forth in the file
frequently if the client issues new requests after having read missing
data, because the very next read again will be short, but the server
may have already skipped forward again to satisfy the new requests.

Avoid that by not issuing new requests until the client has caught up.
That way, the server has to move backwards in the file far less often
and can serve the data much faster and smoother.

(An alternative or additional improvement in this area might be to make
the server-side implementation smarter about re-ordering read requests.
The current implementation just serves them as they come, but the SFTP
draft RFCs explicitly allow some re-ordering. The server could use a
priority queue ordered by offset, trying on _its_ side to avoid
excessive back and forth. Some care would have to be taken to deal with
overlapping read and write requests.)

Bug: apache#293
@tomaswolf tomaswolf self-assigned this Jan 7, 2023
@tomaswolf tomaswolf added this to the 2.9.3 milestone Jan 7, 2023
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jan 7, 2023
If a client uses an SFTP buffer size greater than the maximum amount
of data the server is willing to return for an individual read request,
data will be delivered with "holes". For instance, if the server limit
is 63kB, and the client uses a buffer size of 64kB, it will issue
requests to read 64kb at offset 0, then 64kB at offset 65536, and
so on. But it will get only 63kB at offset 0 and 63kB at offset 65536.
When it notices that, it'll request the missing 1kB at offset 64512
synchronously.

In a large file, this can lead to jumping back and forth in the file
frequently if the client issues new requests after having read missing
data, because the very next read again will be short, but the server
may have already skipped forward again to satisfy the new requests.

Avoid that by not issuing new requests until the client has caught up.
That way, the server has to move backwards in the file far less often
and can serve the data much faster and smoother.

(An alternative or additional improvement in this area might be to make
the server-side implementation smarter about re-ordering read requests.
The current implementation just serves them as they come, but the SFTP
draft RFCs explicitly allow some re-ordering. The server could use a
priority queue ordered by offset, trying on _its_ side to avoid
excessive back and forth. Some care would have to be taken to deal with
overlapping read and write requests.)

Bug: apache#293
@tomaswolf
Copy link
Member Author

The problem in SFTP is caused by excessively jumping back and forth in the file on the server side. It manifests only with NIO2 and MINA but not with Netty because timing is different and the files end up being read more or less sequentially, or the backwards jumps are within some OS buffer size. With NIO2 and MINA, more initial read requests appear to be handled before a backwards jump occurs, which then leads to more and larger backwards jumps if one keeps issuing asynchronous requests.

I did not analyze why the problem did not occur on Windows or locally on OS X; perhaps these OSes handle this better than Linux, or the request timings and their handling are more like in the Netty/Linux case.

I still have no idea why the problem occurs from a particular GitHub CI build on. Perhaps something in the Linux container used to run the tests changed. In any case I have not found any particular commit from which on the performance would have dropped markedly on Linux; the problem was apparent in all runs in my local CentOS 7 VM, no matter how far back I went.

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jan 8, 2023
If a client uses an SFTP buffer size greater than the maximum amount
of data the server is willing to return for an individual read request,
data will be delivered with "holes". For instance, if the server limit
is 63kB, and the client uses a buffer size of 64kB, it will issue
requests to read 64kb at offset 0, then 64kB at offset 65536, and
so on. But it will get only 63kB at offset 0 and 63kB at offset 65536.
When it notices that, it'll request the missing 1kB at offset 64512
synchronously.

In a large file, this can lead to jumping back and forth in the file
frequently if the client issues new requests after having read missing
data, because the very next read again will be short, but the server
may have already skipped forward again to satisfy the new requests.

Avoid that by not issuing new requests until the client has caught up.
That way, the server has to move backwards in the file far less often
and can serve the data much faster and smoother.

Additionally, reduce the buffer size if gaps are detected to the maximum
returned so far by the server. A gap means the server returned less data
than the client requested, which can have three reasons:

* EOF was reached. Reducing the buffer size is fine.
* Server limit was exceeded. Reducing avoids future gaps.
* (Transient) server error.

Because of the third possibility, the buffer size is reduced only after
the fourth gap.

(An alternative or additional improvement in this area might be to make
the server-side implementation smarter about re-ordering read requests.
The current implementation just serves them as they come, but the SFTP
draft RFCs explicitly allow some re-ordering. The server could use a
priority queue ordered by offset, trying on _its_ side to avoid
excessive back and forth. Some care would have to be taken to deal with
overlapping read and write requests.)

Bug: apache#293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant