Fixing memory leaks when a simultaneous close is happening #489

TheLortex · 2022-06-30T13:12:01Z

When the client and the server want to close a connection at the same time, they both send a FIN.

I have identified situations where mirage-tcpip might keep the pcb forever, stuck in closing states, causing a memory leak.
This might be linked to #488.

This PR adds two tests exposing the behavior.

Test 1: `close_ack_scenario`

Both client and server sends a FIN, then they ACK each other's FIN. The state transition for the server is:

Established - Send_fin(1) -> Fin_wait_1(1): server FIN is sent
Fin_wait_1(1) - Recv_ack(1) -> Fin_wait_1(1): previous packet is acked
Fin_wait_1(1) - Recv_fin -> Closing(1): client FIN is received
Closing(1) - Recv_ack(2) -> Time_wait: server FIN is acked

The problem is that the connection state would stay in Time_wait forever.
This is fixed by commit Add timewait timeout for Closing -> Time_wait transition.

Test 2: `close_reset_scenario`

Now, we assume the client doesn't ackowledge the FIN. The connection is then in a Closing(_) state.

I have encountered a situation where it would send a RST instead with a valid but not expected sequence number. In that case, check_valid_segment returns ChallengeAck and writes into q.rx_data but because we are in a closing state that mailbox variable is not read anymore (as in flow.ml Rx.thread would have returned). So the challenge ack is not sent and the connection stay in this zombie state.

The fix for this is to use the send_ack mailbox variable that can be used to send an empty ack even if the rx thread is stopped:
enable challenge ack even after FIN

Untested: What if the client stops communicating ?

In this Closing(_) state, we are waiting for the client to ACK the server's FIN. If nothing happens, the retransmission mechanism should re-send the FIN. Instead, the retransmission timer is currently stopped if the connection is in the Closing(_) state

Add retransmit timer when in a Closing state

Disclaimer

TCP is hard ! I would be very happy if someone can review these changes and make sure nothing is broken with these patches.

haesbaert · 2022-07-01T16:00:15Z

This is a great find ! 🥇
I'm back from 🌴 next week and will have a look

balrajsingh · 2022-07-01T17:02:50Z

I too am away. I'm away till July 10, I will definitely look after I get back. But briefly, the problem looks real and the fix looks correct. I just want to look a bit more carefully to make sure it doesn't lead to an ACK storm where both sides keep sending unnecessary ACKs back and forth forever. This should probably also be checked by looking at traces were simultaneous closes can clearly be seen and then the connection goes on to close correctly without an ACK storm.

…

On Fri, Jul 1, 2022, 6:00 PM Christiano Haesbaert ***@***.***> wrote: This is a great find ! 🥇 I'm back from 🌴 next week and will have a look — Reply to this email directly, view it on GitHub <#489 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG6JEK7MPW24A5B3JUA7E3VR4IZTANCNFSM52JJ3RSA> . You are receiving this because your review was requested.Message ID: ***@***.***>

ansiwen · 2022-07-02T07:03:52Z

Thanks folks for looking into this and thanks a lot @TheLortex for the great work so far!

Some context: This work resulted from a memory leak that we only can observe with an unikernel running as a Muen subject (=VM). As a kvm instance I can't reproduce the issue. Also it only shows up when the requests (from curl in this case) are directly done on the network interface connected to the server hardware Muen is running on. If the request comes from another routed network (from a container for example) it also doesn't show up. ~~The sad news is: If I didn't do something wrong, this fix still doesn't fix our issue. There seems to be more luring.~~ Edit: this patch fixes this! But still there could be more lurking. So it would be great if you folks could look for something more that might be leaking.

A detail that might be relevant, is that the clock provided by the Muen kernel has a reduced resolution for security reasons. Not sure this is really relevant, but just to keep it in mind.

Here some info from our internal ticket:

I created pcap files and debug logs for both cases. As you can see, the debug logs show, that in the leaking case (right side), the pcb entry is not removed:

At the same time the pcap traces show that in the leaking case only one RST package is sent, in the non-leaking case there are four, but I'm not sure if this is relevant:

without leak:

with leak:

Here the pcap files with 1000 requests for both cases (with and without leak):
leak-info-1000-without-leak.pcap.zip
leak-info-1000-with-leak.pcap.zip

EDIT: I can also move this to a new issue, if you prefer to merge this PR as is.

haesbaert · 2022-07-04T15:53:15Z

If I read this right, in

mirage-tcpip/src/tcp/state.ml

Line 124 in bb6e85d

let tstr s (i:action) =

It is always an error to transition to TIME_WAIT purely, without actually arming the timeout.
Looking at Stevens I also don't see any transition to TIME_WAIT where the timeout shouldn't be armed.

haesbaert

Here is a partial review with some questions, I still need to understand the ACK dance, but I'll do it tomorrow.

src/tcp/state.ml

src/tcp/flow.mli

src/tcp/segment.ml

ansiwen · 2022-07-06T06:19:32Z

Not sure if that detail is important, but please note, that the last of several RSTs in the non-leaking case has another Seq value. There is no RST with Seq=805 in the leaking case. Maybe it's broken in both cases, but the additional RST "rescues" the situation?

TheLortex · 2022-07-06T07:14:20Z

Yes exactly. RST with Seq=804 is supposed to trigger a challenge ACK, as it is not the next expected Seq number, as the previous FIN also uses Seq=804. That's why only RST with Seq=805 is handled.

hannesm · 2022-07-07T07:40:24Z

I have not looked into the details of this PR, but @ansiwen reported there's still a leak: is the pcap the same as before this patch, or is there a difference?

ansiwen · 2022-07-07T12:29:20Z

Not sure about pcap, would need to create it, but at least the log output was 100% identical.

ansiwen · 2022-07-18T18:34:47Z

A short heads-up that this fix does indeed fix the leak we are observing. Will update with more details tomorrow.

ansiwen · 2022-07-19T08:24:35Z

Because mirage ignored the pin in config.ml I didn't actually try the patched version before. I pinned now directly in opam and then it finally worked!

Here is how the debug log looks like now (right) compared to before (left):

The network trace looks like this now:

leak-info-1000-baremetal-with-fix4.pcap.zip

haesbaert · 2022-07-20T10:10:56Z

src/tcp/segment.ml

-    if Lwt_mvar.is_empty q.rx_data
-      then Lwt_mvar.put q.rx_data (Some [], Some Sequence.zero)
+    if Lwt_mvar.is_empty q.send_ack
+      then Lwt_mvar.put q.send_ack Sequence.zero
      else Lwt.return_unit


If I read this correctly, the original Ack.t stored in Flow.flow (aka Flow.pcb) is not reachable through Segment.Rx.t, which is a shame, maybe ack should have been stored deeper.

So it begs the question why is your send_ack not a Ack.t, so instead of pushing something directly on a mailbox you could do the same Ack.pushack dance that the Rx.thread do, something like:

let send_challenge_ack q = (* TODO: rfc5961 ACK Throttling *) Ack.pushack q.ack Sequence.zero

I am not familiar with the Ack module. But indeed it looks like it's doing similar !

I have added a commit that uses the pushack function as suggested.
Now the Segment.Rx module is functorized against the Ack.M signature.

haesbaert

That looks good to me, I just hope my suggestion didn't break the fix somehow :).
Thanks for the work, this is a great find !

haesbaert · 2022-07-21T16:13:45Z

Can we get another OK for this ?

TheLortex · 2022-07-26T12:45:22Z

@hannesm or @balrajsingh would one you be available to review the patch ? We're now confident that it fixes Nitrokey's issue.

hannesm · 2022-07-26T13:58:41Z

Code looks fine (though I'm still not an expert at this TCP/IP stack), @ansiwen reports it solves the issue. Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

hannesm · 2022-07-26T14:00:02Z

Question for @ansiwen: did you test 1813264 or which commit?

TheLortex · 2022-07-26T16:07:33Z

Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

Great, I'll take care of that.

@ansiwen tested on 1bd6270, and normally the additional commit 1813264 does not change the behavior of the stack

balrajsingh · 2022-07-26T16:25:41Z

I mainly looked to see if the send_challenge_ack change breaks anything else - this is the central change as I understand it, please let me know if not so.

I think that the fix is good. It is specific to the challenge ACK situation only, when no data needs to be pushed up to the application. Your test is also convincing. I too agree that the change can be merged.

And @TheLortex, yes TCP is hard. So glad that you are looking at it in such detail. Many thanks!

ansiwen · 2022-07-26T18:31:46Z

@ansiwen tested on 1bd6270, and normally the additional commit 1813264 does not change the behavior of the stack

No, actually I tested still on b9b88d9, I will test tomorrow again with the latest commit.

TheLortex · 2022-07-27T10:18:01Z

I have added a changelog entry and I am ready to cut a patch release today when @ansiwen confirms that the latest commit still works !

CHANGES.md

ansiwen · 2022-07-27T12:33:34Z

I can confirm that 1813264 still fixes the issue.

dinosaure · 2022-07-27T13:22:25Z

Let's merge and cut a release so! Thanks all for your work.

@TheLortex

CHANGES: * TCP: fix memory leaks on connection close in three scenarios (mirage/mirage-tcpip#489 @TheLortex) - simultanous close: set up the timewait timer in the `Closing(1) - Recv_ack(2) -> Time_wait` state transition - client sends a RST instead of a FIN: enable sending a challenge ACK even when the reception thread is stopped - client doesn't ACK server's FIN: enable the retransmit timer in the `Closing(_)` state

TheLortex added 5 commits June 30, 2022 14:41

extract low_level testing

f14ce63

add failing test for simultaneous close

1ba2e7e

Add timewait timeout for Closing -> Time_wait transition

bb6e85d

Add retransmit timer when in a Closing state

c088044

enable challenge ack even after FIN

b9b88d9

TheLortex requested review from balrajsingh and hannesm June 30, 2022 13:22

haesbaert self-requested a review July 4, 2022 16:29

haesbaert reviewed Jul 4, 2022

View reviewed changes

src/tcp/state.ml Outdated Show resolved Hide resolved

src/tcp/state.ml Outdated Show resolved Hide resolved

src/tcp/flow.mli Outdated Show resolved Hide resolved

src/tcp/segment.ml Outdated Show resolved Hide resolved

TheLortex added 4 commits July 18, 2022 20:13

trim whitespace

bebfcf8

Rename internal_n_channels to num_open_channels

ba8f23d

Re-add the is_empty check before writing in the send_ack mvar

cd90787

extract timewait transition to a new function

1bd6270

haesbaert reviewed Jul 20, 2022

View reviewed changes

Use the ACK module to send the challenge ACK

1813264

haesbaert approved these changes Jul 21, 2022

View reviewed changes

hannesm reviewed Jul 27, 2022

View reviewed changes

CHANGES.md Outdated Show resolved Hide resolved

changes for 7.1.2

aac0e02

TheLortex force-pushed the simultaneous-close-fix branch from 3e1828f to aac0e02 Compare July 27, 2022 10:46

dinosaure merged commit efbebdf into mirage:main Jul 27, 2022

dinosaure mentioned this pull request Jul 27, 2022

[new release] tcpip (7.1.2) ocaml/opam-repository#21897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing memory leaks when a simultaneous close is happening #489

Fixing memory leaks when a simultaneous close is happening #489

TheLortex commented Jun 30, 2022

haesbaert commented Jul 1, 2022

balrajsingh commented Jul 1, 2022 via email

ansiwen commented Jul 2, 2022 •

edited

Loading

haesbaert commented Jul 4, 2022 •

edited

Loading

haesbaert left a comment

ansiwen commented Jul 6, 2022

TheLortex commented Jul 6, 2022

hannesm commented Jul 7, 2022

ansiwen commented Jul 7, 2022 •

edited

Loading

ansiwen commented Jul 18, 2022

ansiwen commented Jul 19, 2022

haesbaert Jul 20, 2022

TheLortex Jul 21, 2022

haesbaert left a comment

haesbaert commented Jul 21, 2022

TheLortex commented Jul 26, 2022

hannesm commented Jul 26, 2022

hannesm commented Jul 26, 2022

TheLortex commented Jul 26, 2022

balrajsingh commented Jul 26, 2022

ansiwen commented Jul 26, 2022

TheLortex commented Jul 27, 2022

ansiwen commented Jul 27, 2022

dinosaure commented Jul 27, 2022

Fixing memory leaks when a simultaneous close is happening #489

Fixing memory leaks when a simultaneous close is happening #489

Conversation

TheLortex commented Jun 30, 2022

Test 1: close_ack_scenario

Test 2: close_reset_scenario

Untested: What if the client stops communicating ?

Disclaimer

haesbaert commented Jul 1, 2022

balrajsingh commented Jul 1, 2022 via email

ansiwen commented Jul 2, 2022 • edited Loading

haesbaert commented Jul 4, 2022 • edited Loading

haesbaert left a comment

Choose a reason for hiding this comment

ansiwen commented Jul 6, 2022

TheLortex commented Jul 6, 2022

hannesm commented Jul 7, 2022

ansiwen commented Jul 7, 2022 • edited Loading

ansiwen commented Jul 18, 2022

ansiwen commented Jul 19, 2022

haesbaert Jul 20, 2022

Choose a reason for hiding this comment

TheLortex Jul 21, 2022

Choose a reason for hiding this comment

haesbaert left a comment

Choose a reason for hiding this comment

haesbaert commented Jul 21, 2022

TheLortex commented Jul 26, 2022

hannesm commented Jul 26, 2022

hannesm commented Jul 26, 2022

TheLortex commented Jul 26, 2022

balrajsingh commented Jul 26, 2022

ansiwen commented Jul 26, 2022

TheLortex commented Jul 27, 2022

ansiwen commented Jul 27, 2022

dinosaure commented Jul 27, 2022

Test 1: `close_ack_scenario`

Test 2: `close_reset_scenario`

ansiwen commented Jul 2, 2022 •

edited

Loading

haesbaert commented Jul 4, 2022 •

edited

Loading

ansiwen commented Jul 7, 2022 •

edited

Loading