Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: tcp: When closing the connection send FIN immediately #19747

Merged
merged 1 commit into from
Oct 17, 2019

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented Oct 10, 2019

The earlier code was queuing the FIN that is sent when connection
is closed. This caused long delay (200 ms) before the peer at the
other end noticed that the connection was actually closed.

Fixes #19678

Signed-off-by: Jukka Rissanen jukka.rissanen@linux.intel.com

@jukkar
Copy link
Member Author

jukkar commented Oct 10, 2019

I will debug this a bit more, apache bench reports peer resets.

@jukkar jukkar added the DNM This PR should not be merged (Do Not Merge) label Oct 10, 2019
@pfalcon
Copy link
Contributor

pfalcon commented Oct 10, 2019

This caused long delay (200 ms) before the peer at the
other end noticed that the connection was actually closed.

If you debugged and found that cause, kudos. In my list, that 200+ms delay appeared after last @tbursztyka's last refactor (7f2cb02 and friends) (note: not pointing fingers, but providing information/context). I should have reported it, though can't find a reference (e.g., it's not in #13943, which is another issue happened soon after refactor).

But with the in mind, this:

net: tcp: When closing the connection send FIN immediately. The earlier code was queuing the FIN that is sent when connection is closed.

- is not correct. FIN is just as TCP segment as any other (e.g. a data segment). Actually, all TCP segments are the same. They should be queued. They should be retransmitted if response doesn't arrive at time. They should be sent in proper order. This starts with SYN segment - it should be sent via standard TCP means, in particular, retransmitted if no reply. We handle it an adhoc way, and due to this don't retransmit, which is a known issue (#9944).

The same applies to FIN: it should be put in the TCP segment queue. Which may contain TCP data segments not fully sent yet. FIN should be sent only after they are sent out. But they may be retransmitted yet, so there should be no hurry to send FIN. Finally, FIN itself might be retransmitted a few times.

Anyway, writing that based on the description here in the PR, let me look at the code.

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, based on the previous comment (#19747 (comment)), this is not a correct change and I have to give -1.

FIN really should be queued. But it shouldn't linger there without a reason. Let's figure out what causes that 200ms with it being sent out (or whatever the actual underlying issue may be).

The earlier code was always queuing the FIN that is sent when
connection is closed. This caused long delay (200 ms) before the peer at
the other end noticed that the connection was actually closed.
Now check if there is nothing in the queue, then send the FIN
immediately. If there is some data in the queue, flush it when a valid
ack has been received.

Fixes zephyrproject-rtos#19678

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
@jukkar jukkar force-pushed the issue-19678-tcp-slow-close branch from dd5d282 to 6c4a515 Compare October 10, 2019 15:33
@jukkar
Copy link
Member Author

jukkar commented Oct 10, 2019

If you debugged and found that cause, kudos. In my list, that 200+ms delay appeared after last @tbursztyka's last refactor (7f2cb02 and friends) (note: not pointing fingers, but providing information/context). I should have reported it, though can't find a reference (e.g., it's not in #13943, which is another issue happened soon after refactor).

Difficult to know the culprit commit. Last commit that changed things around the active close was your commit 338dc8a long time ago.

@jukkar jukkar removed the DNM This PR should not be merged (Do Not Merge) label Oct 10, 2019
@@ -972,6 +972,11 @@ int net_tcp_send_pkt(struct net_pkt *pkt)
return net_send_data(pkt);
}

static void flush_queue(struct net_context *context)
{
(void)net_tcp_send_data(context, NULL, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to this function says:

        /* For now, just send all queued data synchronously.  Need to
         * add window handling and retry/ACK logic.
         */

So, it still would be a workaround, not a solution, and could lead to silent data loss.

@pfalcon
Copy link
Contributor

pfalcon commented Oct 11, 2019

Tested this with standard test method I used for years (qemu_x86+SLIP, dump_http_server, ab -n1000):

Percentage of the requests served within a certain time (ms)
  50%     22
  66%     25
  75%     29
  80%     33
  90%   1026
  95%   1035
  98%   1047
  99%   1051
 100%   1069 (longest request)

That IMHO only deteriorates the situation, as instead of stable ~200ms we get high delay dispersion of unknown reason. So, I'll keep my -1 and extend my call to understand where 200ms delays comes from (while it shouldn't be there), instead of throwing random workaround.

(Mostly a crosspost from #19678 (comment))

@pfalcon pfalcon requested a review from dleach02 October 11, 2019 22:36
@pfalcon
Copy link
Contributor

pfalcon commented Oct 11, 2019

As was found in #19678 (comment) , this effectively tries to undo the effect of 45df09c . And arguably, it does it in a roundabout way, which still raises questions how correct it is (i.e. what are corner cases and unobvious implications of doing it this way).

I'd suggest to seriously consider reinstating the previous code as another option (while of course addressing @dleach02's concern as described in the above commit message). Or alternatively, explain in more detail why the solution here is better. Note that such an explanation would need to account for the actual test data: #19678 (comment) , which shows that with this patch, there's noticeable and high-dispersion delay distribution of unknown nature for connection handling times, whereas simply reverting 45df09c would lead to a nice flat delay distribution with flat dispersion.

@jukkar
Copy link
Member Author

jukkar commented Oct 14, 2019

@pfalcon thanks for thorough investigation about the issue. After looking the code, I think we should not revert commit 45df09c as the original code was doing a bit suspicious things. It for example did not check the 6lo tech check that is done in net_tcp_send_data() when calling net_tcp_send_pkt().

Another suspicious stuff that current code does is in net_tcp_send_data()

zephyr/subsys/net/ip/tcp.c

Lines 1017 to 1024 in bb9c453

ret = net_tcp_send_pkt(pkt);
if (ret < 0 && !is_6lo_technology(pkt)) {
NET_DBG("[%p] pkt %p not sent (%d)",
context->tcp, pkt, ret);
net_pkt_unref(pkt);
}
net_pkt_set_queued(pkt, true);

The function does not return if he packet was not sent, but continues to call net_pkt_set_queued(pkt, true); which does not look right.

@jukkar
Copy link
Member Author

jukkar commented Oct 14, 2019

I think what this PR proposes is more or less correct thing to do:

  • If the there are no other TCP segments queued to be sent, it then sends the final FIN immediately
  • If there are some data segments to be sent, the FIN is queued and sent after those ones
  • After receiving a valid ack, we flush the queue in order to send any pending FIN

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfalcon wrote:

50% 22
90% 1026

That IMHO only deteriorates the situation, as instead of stable ~200ms we get high delay dispersion of unknown reason.

@jukkar wrote in #19678 (comment):

Sample: dump_http_server
Git: master + fix #19747
Results: The request take about 1000ms. Because we are no longer waiting 200ms, ab gets results quicker and can send new request earlier. The dumb_http_server is a single threaded server which will wait for incoming connections, serve them and then wait another connection. At the moment we do not have any queing for received SYN packets so if the application has not called accept(), the incoming SYN is dropped. This is happening here and I verified it from debug logs. The sample app is a bit too slow to accept another connection attempt and this forces the Linux to re-send a new SYN packet after around 1sec timeout. This is not happening always, sometimes dumb_http_server manages to accept the connection in time, thus the ab reported time is <1sec.

To verify this, I took this patch and applied revert for 1a6f4a6 (required a trivial fixup).

After that, testing mps2_an385+overlay-smsc911x.conf (again, to avoid QEMU+SLIP which are known to have issues) leads to:

$ time ab -n1000 http://192.0.2.1:8080/
...
Percentage of the requests served within a certain time (ms)
  50%     12
  66%     13
  75%     14
  80%     14
  90%     15
  95%     16
  98%     17
  99%     18
 100%     23 (longest request)

@jukkar

I think what this PR proposes is more or less correct thing to do:

With the all explanations provided and my testing above which adheres to the explanations, I may only agree with the above, this is definitely an improvement over what we had before, and itself doesn't seem to introduce more issues (but exposes issues we got independently some time ago).

And in a long while, this a case where there was actual thorough testing (with multiple targets, etc.) and results of that testing were documented and discussed/explained, I very much appreciate that.

@jukkar jukkar merged commit 97b6588 into zephyrproject-rtos:master Oct 17, 2019
@jukkar jukkar deleted the issue-19678-tcp-slow-close branch October 17, 2019 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Noticeable delay between processing multiple client connection requests (200ms+)
3 participants