-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing memory leaks when a simultaneous close is happening #489
Conversation
This is a great find ! 🥇 |
I too am away. I'm away till July 10, I will definitely look after I get
back. But briefly, the problem looks real and the fix looks correct. I
just want to look a bit more carefully to make sure it doesn't lead to an
ACK storm where both sides keep sending unnecessary ACKs back and forth
forever. This should probably also be checked by looking at traces were
simultaneous closes can clearly be seen and then the connection goes on to
close correctly without an ACK storm.
…On Fri, Jul 1, 2022, 6:00 PM Christiano Haesbaert ***@***.***> wrote:
This is a great find ! 🥇
I'm back from 🌴 next week and will have a look
—
Reply to this email directly, view it on GitHub
<#489 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG6JEK7MPW24A5B3JUA7E3VR4IZTANCNFSM52JJ3RSA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Thanks folks for looking into this and thanks a lot @TheLortex for the great work so far! Some context: This work resulted from a memory leak that we only can observe with an unikernel running as a Muen subject (=VM). As a kvm instance I can't reproduce the issue. Also it only shows up when the requests (from curl in this case) are directly done on the network interface connected to the server hardware Muen is running on. If the request comes from another routed network (from a container for example) it also doesn't show up. A detail that might be relevant, is that the clock provided by the Muen kernel has a reduced resolution for security reasons. Not sure this is really relevant, but just to keep it in mind. Here some info from our internal ticket: I created pcap files and debug logs for both cases. As you can see, the debug logs show, that in the leaking case (right side), the pcb entry is not removed: At the same time the pcap traces show that in the leaking case only one RST package is sent, in the non-leaking case there are four, but I'm not sure if this is relevant: Here the pcap files with 1000 requests for both cases (with and without leak): EDIT: I can also move this to a new issue, if you prefer to merge this PR as is. |
If I read this right, in Line 124 in bb6e85d
It is always an error to transition to TIME_WAIT purely, without actually arming the timeout. Looking at Stevens I also don't see any transition to TIME_WAIT where the timeout shouldn't be armed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a partial review with some questions, I still need to understand the ACK dance, but I'll do it tomorrow.
Not sure if that detail is important, but please note, that the last of several RSTs in the non-leaking case has another Seq value. There is no RST with Seq=805 in the leaking case. Maybe it's broken in both cases, but the additional RST "rescues" the situation? |
Yes exactly. RST with Seq=804 is supposed to trigger a challenge ACK, as it is not the next expected Seq number, as the previous FIN also uses Seq=804. That's why only RST with Seq=805 is handled. |
I have not looked into the details of this PR, but @ansiwen reported there's still a leak: is the pcap the same as before this patch, or is there a difference? |
Not sure about pcap, would need to create it, but at least the log output was 100% identical. |
A short heads-up that this fix does indeed fix the leak we are observing. Will update with more details tomorrow. |
src/tcp/segment.ml
Outdated
if Lwt_mvar.is_empty q.rx_data | ||
then Lwt_mvar.put q.rx_data (Some [], Some Sequence.zero) | ||
if Lwt_mvar.is_empty q.send_ack | ||
then Lwt_mvar.put q.send_ack Sequence.zero | ||
else Lwt.return_unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly, the original Ack.t
stored in Flow.flow
(aka Flow.pcb
) is not reachable through Segment.Rx.t
, which is a shame, maybe ack
should have been stored deeper.
So it begs the question why is your send_ack
not a Ack.t
, so instead of pushing something directly on a mailbox you could do the same Ack.pushack
dance that the Rx.thread
do, something like:
let send_challenge_ack q =
(* TODO: rfc5961 ACK Throttling *)
Ack.pushack q.ack Sequence.zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the Ack
module. But indeed it looks like it's doing similar !
I have added a commit that uses the pushack
function as suggested.
Now the Segment.Rx
module is functorized against the Ack.M
signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me, I just hope my suggestion didn't break the fix somehow :).
Thanks for the work, this is a great find !
Can we get another OK for this ? |
@hannesm or @balrajsingh would one you be available to review the patch ? We're now confident that it fixes Nitrokey's issue. |
Code looks fine (though I'm still not an expert at this TCP/IP stack), @ansiwen reports it solves the issue. Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like). |
Great, I'll take care of that. @ansiwen tested on 1bd6270, and normally the additional commit 1813264 does not change the behavior of the stack |
I mainly looked to see if the I think that the fix is good. It is specific to the challenge ACK situation only, when no data needs to be pushed up to the application. Your test is also convincing. I too agree that the change can be merged. And @TheLortex, yes TCP is hard. So glad that you are looking at it in such detail. Many thanks! |
I have added a changelog entry and I am ready to cut a patch release today when @ansiwen confirms that the latest commit still works ! |
3e1828f
to
aac0e02
Compare
I can confirm that 1813264 still fixes the issue. |
Let's merge and cut a release so! Thanks all for your work. |
CHANGES: * TCP: fix memory leaks on connection close in three scenarios (mirage/mirage-tcpip#489 @TheLortex) - simultanous close: set up the timewait timer in the `Closing(1) - Recv_ack(2) -> Time_wait` state transition - client sends a RST instead of a FIN: enable sending a challenge ACK even when the reception thread is stopped - client doesn't ACK server's FIN: enable the retransmit timer in the `Closing(_)` state
When the client and the server want to close a connection at the same time, they both send a
FIN
.I have identified situations where
mirage-tcpip
might keep the pcb forever, stuck in closing states, causing a memory leak.This might be linked to #488.
This PR adds two tests exposing the behavior.
Test 1:
close_ack_scenario
Both client and server sends a FIN, then they ACK each other's FIN. The state transition for the server is:
Established - Send_fin(1) -> Fin_wait_1(1)
: server FIN is sentFin_wait_1(1) - Recv_ack(1) -> Fin_wait_1(1)
: previous packet is ackedFin_wait_1(1) - Recv_fin -> Closing(1)
: client FIN is receivedClosing(1) - Recv_ack(2) -> Time_wait
: server FIN is ackedThe problem is that the connection state would stay in
Time_wait
forever.This is fixed by commit Add timewait timeout for Closing -> Time_wait transition.
Test 2:
close_reset_scenario
Now, we assume the client doesn't ackowledge the FIN. The connection is then in a
Closing(_)
state.I have encountered a situation where it would send a
RST
instead with a valid but not expected sequence number. In that case,check_valid_segment
returnsChallengeAck
and writes intoq.rx_data
but because we are in a closing state that mailbox variable is not read anymore (as in flow.mlRx.thread
would have returned). So the challenge ack is not sent and the connection stay in this zombie state.The fix for this is to use the
send_ack
mailbox variable that can be used to send an empty ack even if the rx thread is stopped:enable challenge ack even after FIN
Untested: What if the client stops communicating ?
In this
Closing(_)
state, we are waiting for the client to ACK the server's FIN. If nothing happens, the retransmission mechanism should re-send the FIN. Instead, the retransmission timer is currently stopped if the connection is in theClosing(_)
stateAdd retransmit timer when in a Closing state
Disclaimer
TCP is hard ! I would be very happy if someone can review these changes and make sure nothing is broken with these patches.