Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A stress test on my SMTP server show me an unknow error #456

Closed
dinosaure opened this issue Oct 25, 2021 · 23 comments
Closed

A stress test on my SMTP server show me an unknow error #456

dinosaure opened this issue Oct 25, 2021 · 23 comments

Comments

@dinosaure
Copy link
Member

When I try to stress-test my SMTP server which mostly retransmit an incoming email to an other point, I got this error message:

console data 2021-10-25T17:05:54-00:00: 2021-10-25 17:05:54 -00:00: ERR [netif] [listen] error while handling Lwt_stream.Full, continuing. bt: Raised at Stdlib__hashtbl.find in file "hashtbl.ml", line 541, characters 17-32

I'm not sure about what such error refers, someone knows about such error? And in which context it appears? Currently, you can reproduce it with these unikernels: https://github.com/dinosaure/ptt/tree/master/unikernel

Btw, is it possible to test locally without Solo5 the mirage-tcpip stack (not the unix/socket one)?

@hannesm
Copy link
Member

hannesm commented Oct 25, 2021

(a) the (netif) listen implementation instantiates a try .. catch around the worker function f (see https://github.com/mirage/mirage-net-solo5/blob/v0.6.2/src/netif.ml#L124-L145), also discussed in mirage/mirage#1036
(b) you can configure with "mirage configure -t unix --net=direct" so that the mirage-tcpip stack (this repository here) is used with a tap device, i.e. not the socket stack.
(c) I fail to understand where Lwt_stream are involved -- I can't see any reference to that module in this repository (apart from tests)

@dinosaure
Copy link
Member Author

Thanks for (b), I'm currently try to stress test and use memtrace to see where we have a bug.
About (c), I implemented a derivation of Mirage_stackV4V6 which is more close to the Unix pattern (bind/accept/close instead of the callback). In that way, I used an Lwt_stream.t (a bounded one) to implement producer (the callback)/consumer (accept).

@dinosaure
Copy link
Member Author

Come back with few outcomes.

I tried to use memtrace and saw that mirage-crypto-ec allocates a lot. I was thinking that, at this stage, the GC was under a huge pressure between small Cstruct.t allocated by mirage-crypto-ec and the rest of live objects. So I proposed something which uses bytes instead of Cstruct.t here: mirage/mirage-crypto#146

However, with the help from @icristescu and the great tool from @zshipko (see memtrace-filters). I tried to check which module allocates and never free its buffers. It seems that mirage-crypto-ec was not in that case (but still continue to think that it put a huge pressure on the GC).

And it seems that we found a correlation between our Out_of_memory and Netif.listen. Indeed, we filtered the Netif module which seems to allocate a lot and it follows our graph - and so never de-allocates objects. memtrace tells to us that the Cstruct.t is the one allocated in the listen function:

https://github.com/mirage/mirage-net-unix/blob/3f8bb2235c904c1e2fa01cd7cd59d0468ff7be9f/src/netif.ml#L117-L133

The comment above the function seems clear that a dragon is here! At this stage, I don't really know what we need to do but it seems clear that the problem is systemic where mirage-net-solo5 follows the same code - and I can reproduce the same problem with hvt.

@hannesm
Copy link
Member

hannesm commented Oct 28, 2021

I'm not sure how memtrace adds sources to allocations. What is a thing is that Netif.listen for each received packet starts an Lwt task and performs the user-provided callback -- i.e. no real surprise it being the thing that allocates a lot.

@hannesm
Copy link
Member

hannesm commented Oct 28, 2021

So, thinking out loud - if the Lwt tasks are a bottleneck, eventually we should revise the design. I had in my head for some time (similar to what UNIX does), what about:

  • Netif.listen does not allocate a Lwt task, but instead up to the protocol layer delivers the packet
  • For TCP services they create a task (per flow?) and wait / are signalled if there's incoming data
    --> fewer tasks, also buffers could be freed earlier (TCP service would eventually need to copy)

@dinosaure
Copy link
Member Author

I'm not sure how memtrace adds sources to allocations. What is a thing is that Netif.listen for each received packet starts an Lwt task and performs the user-provided callback

It seems that in my context, Netif.listen does the Lwt.async and the user's function never finish (and keep a reference on the given Cstruct.t). It's probably why these objects are never de-allocated and it seems really difficult to follow the execution path at this level (may be mirage-tcpip - which gives the function to Netif - never terminate and so it keeps a reference on this Cstruct.t - and the GC does not have any opportunity to free it even if the connection was closed).

I need to investigate deeper because between my SMTP stack and Netif, many things happen and may be it's my fault.

@dinosaure
Copy link
Member Author

graph

Just to be clear, this graph comes from memtrace and:

  1. in blue, this is all of my objects
  2. in yellow, this is all objects from Netif module

It seems clear that allocations from Netif module are never de-allocated - for instance Mirage_crypto_ec (with mirage/mirage-crypto#146) has a stable memory consumption which does not converge with my blue line.

@dinosaure
Copy link
Member Author

Come back with some news, the problem about the Cstruct.t can appear in 2 contexts:

  • a bad ownership transfer between the given Cstruct.t by the Netif.listen and something inner mirage-tcpip without copy
  • a fiber which does not terminate and keep alive the given Cstruct.t

So I updated my workflow to count fibers and it seems that:

  • on the application level (for instance SMTP + TLS - see https://github.com/dinosaure/ptt), it seems that any Lwt.async in the server part terminate correctly
  • however, it seems that a Netif.listen launch some fibers and they never terminate. The evolution of the number of fibers which did not terminate correspond to our graph above

Now the question is to find the execution path where something never terminate and keep the given Cstruct.t alive.

@dinosaure
Copy link
Member Author

A get some outcomes, it seems that some fibers does not terminates and all of them correspond to a RST TCP/IP packet:

>>> LISTEN CALLBACK TERMINATES (remains 25 callbacks).
[000000000000ec0a:0001635515612.34]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 7a1c e30a 304b 0000 0000 5004  ....z...0K....P.
                                     00000030: 0000 9eff 0000                           ......
                                     
[000000000000ec0b:0001635515612.34]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 7a1c e30a 304c 0000 0000 5004  ....z...0L....P.
                                     00000030: 0000 9efe 0000                           ......
                                     
[0000000000010878:0001635515616.49]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 7ac6 102b 4d59 0000 0000 5004  ....z..+MY....P.
                                     00000030: 0000 5427 0000                           ..T'..
                                     
[0000000000010879:0001635515616.49]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 7ac6 102b 4d5a 0000 0000 5004  ....z..+MZ....P.
                                     00000030: 0000 5426 0000                           ..T&..
                                     
[00000000000307ef:0001635515695.62]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 86e6 c50f 966f 0000 0000 5004  .........o....P.
                                     00000030: 0000 4a0c 0000                           ..J...
                                     
[00000000000504d5:0001635515762.27]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 9322 162b ffbf 0000 0000 5004  .....".+......P.
                                     00000030: 0000 8364 0000                           ...d..
                                     
[000000000005a94e:0001635515785.02]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 0019 9725 fdea 195a 0000 0000 5004  .....%...Z....P.
                                     00000030: 0000 7e07 0000                           ..~...
                                     
[0000000000062e13:0001635515806.49]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d2fe 01d1 4c9e 16d5 0000 0000 5004  ......L.......P.
                                     00000030: 0000 f447 0000                           ...G..
                                     
[0000000000062e19:0001635515806.49]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d2fe 01d1 4c9e 16d6 0000 0000 5004  ......L.......P.
                                     00000030: 0000 f446 0000                           ...F..
                                     
[00000000000647d2:0001635515812.61]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d444 01d1 c8a4 5b7c 0000 0000 5004  ...D....[|....P.
                                     00000030: 0000 3254 0000                           ..2T..
                                     
[00000000000647d8:0001635515812.62]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d446 01d1 1a28 d2f7 0000 0000 5004  ...F...(......P.
                                     00000030: 0000 6953 0000                           ..iS..
                                     
[00000000000647e8:0001635515812.62]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d448 01d1 3ad9 38a5 0000 0000 5004  ...H..:.8.....P.
                                     00000030: 0000 e2f2 0000                           ......
                                     
[00000000000647eb:0001635515812.62]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d448 01d1 3ad9 38a6 0000 0000 5004  ...H..:.8.....P.
                                     00000030: 0000 e2f1 0000                           ......
                                     
[0000000000068a2a:0001635515832.67]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d79e 01d1 a4ce 9f32 0000 0000 5004  .........2....P.
                                     00000030: 0000 0f1a 0000                           ......
                                     
[0000000000068a2f:0001635515832.67]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 d7a0 01d1 9b0b 9caa 0000 0000 5004  ..............P.
                                     00000030: 0000 1b63 0000                           ...c..
                                     
[000000000006c2ef:0001635515850.02]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da80 01d1 add0 9f00 0000 0000 5004  ..............P.
                                     00000030: 0000 0368 0000                           ...h..
                                     
[000000000006c2f5:0001635515850.03]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da82 01d1 9226 c7cf 0000 0000 5004  .......&......P.
                                     00000030: 0000 f640 0000                           ...@..
                                     
[000000000006c2f8:0001635515850.03]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da84 01d1 e87a 1c88 0000 0000 5004  .......z......P.
                                     00000030: 0000 4b32 0000                           ..K2..
                                     
[000000000006c2fc:0001635515850.03]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da7a 01d1 dc96 bbe3 0000 0000 5004  ...z..........P.
                                     00000030: 0000 b7c4 0000                           ......
                                     
[000000000006c306:0001635515850.04]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da84 01d1 e87a 1c89 0000 0000 5004  .......z......P.
                                     00000030: 0000 4b31 0000                           ..K1..
                                     
[000000000006c310:0001635515850.06]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da80 01d1 add0 9f01 0000 0000 5004  ..............P.
                                     00000030: 0000 0367 0000                           ...g..
                                     
[000000000006c314:0001635515850.06]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da86 01d1 849f 3440 0000 0000 5004  ........4@....P.
                                     00000030: 0000 9753 0000                           ...S..
                                     
[000000000006c318:0001635515850.06]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da7c 01d1 ae35 a04c 0000 0000 5004  ...|...5.L....P.
                                     00000030: 0000 01bb 0000                           ......
                                     
[000000000006c31a:0001635515850.06]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da82 01d1 9226 c7d0 0000 0000 5004  .......&......P.
                                     00000030: 0000 f63f 0000                           ...?..
                                     
[000000000006c31d:0001635515850.06]: 00000000: b69c 7c3f f8cb 8e35 4374 8166 0800 4500  ..|?...5Ct.f..E.
                                     00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
                                     00000020: 0103 da7a 01d1 dc96 bbe4 0000 0000 5004  ...z..........P.
                                     00000030: 0000 b7c3 0000                           ......

So we can start to figure out which execution path we took from these packets and find our infinite loop!

@djs55
Copy link
Member

djs55 commented Oct 29, 2021

@dinosaure that's a very interesting clue! Do you think we're receiving RST packets and then getting stuck handling them? Perhaps the TCP stack is sending addtional data by mistake after the send direction has been officially closed with FIN, prompting the remote to send back a RST, causing us to get stuck? It might be worth looking at some packet traces with wireshark to look for anomalies. Maybe it's 2 bugs: 1 bug causing the remote to send a RST as an error, and bug 2 in our handling of RST?

@dinosaure
Copy link
Member Author

dinosaure commented Oct 29, 2021

This is finally the context where the fiber never terminates (note that the client use the Unix socket):
wireshark

The last packet is the packet where the fiber never terminates.

@dinosaure
Copy link
Member Author

With the really nice help of @djs55, we figure out that something bad happens when the connection want to be closed. We discovered some spurious RST packet which are not crafted by mirage-tcpip even if the destination of them was for the client.

A possible explanation exists (but I'm not confident with the reality). The fact is that my client did not close properly the connection and when it terminates, it seems that the kernel (it's a simple Unix socket) decides to craft the RST packet for us (from *:25 to the client) and mirage-tcpip does not know how to deal with it.

So I decided to fix my client and properly close the connection and the situation is worse. Now, the client wants to send an RST packet (from the client to *:25) and mirage-tcpip still don't know how to deal with it. Wireshark said to us that such packet is bad (in red) and finally, we fallback to our initial situation where you launch a fiber which never terminates with such packet.

I will try to figure out why such packet appears from the client now.

@dinosaure
Copy link
Member Author

I found the issue FINALLY 🎉 ! It seems that this specific line never terminates:

Lwt_mvar.put q.rx_data (None, Some Sequence.zero)

I don't really know what that means...

@dinosaure
Copy link
Member Author

It seems that this is not the only path where the fiber can block, I found some situation where it reaches this point:

| `ChallengeAck ->
send_challenge_ack q

And the fiber never terminates in this situation too.

@dinosaure
Copy link
Member Author

dinosaure commented Oct 29, 2021

So I can confirm that this is where we have our memory leak (/cc @hannesm) as far your application does not leak. An hot fix about that is:

diff --git a/src/tcp/segment.ml b/src/tcp/segment.ml
index 0394c5fa..466ec30a 100644
--- a/src/tcp/segment.ml
+++ b/src/tcp/segment.ml
@@ -204,7 +204,9 @@ module Rx(Time:Mirage_time.S) = struct
       in
       tx_ack <&> urx_inform
     | `ChallengeAck ->
-      send_challenge_ack q
+      if Lwt_mvar.is_empty q.rx_data
+      then send_challenge_ack q
+      else Lwt.return_unit
     | `Drop ->
       Lwt.return_unit
     | `Reset ->
@@ -218,7 +220,9 @@ module Rx(Time:Mirage_time.S) = struct
       in
       txalert (Window.ack_serviced q.wnd) >>= fun () ->
       (* Use the fin path to inform the application of end of stream *)
-      Lwt_mvar.put q.rx_data (None, Some Sequence.zero)
+      if Lwt_mvar.is_empty q.rx_data
+      then Lwt_mvar.put q.rx_data (None, Some Sequence.zero)
+      else Lwt.return_unit
 end
 
 (* Transmitted segments are sent in-order, and may also be marked

However, I still have some leaks in some context, at least I got one. If someone who knows better about mirage-tcpip can help me to fix this problem. It seems that in this path, we want to fill an already filled mvar so we just block infinitely.

@hannesm
Copy link
Member

hannesm commented Nov 2, 2021

Thanks for the investigations @dinosaure. I've to pass on the current TCP stack / what the segment module is doing. If your patch fixes the memory issue you encounter, I'd be in favour to push and release that.

@dinosaure
Copy link
Member Author

I need to investigate more (but I will do next week probably - vacation) because Netif.listen stills leak a memory even if you applied this patch.

@dinosaure
Copy link
Member Author

I can confirm that the current patch fix the memory leak for mirage-tcpip even if it seems that few TCP/IP packets remains for whatever reasons. We probably should integrate the patch indeed @hannesm - but it seems that I found an other memory leak on my SMTP server.

@dinosaure
Copy link
Member Author

dinosaure commented Nov 24, 2021

My last experiment show that we still have a leak on Netif.* functions:
capture003

module is how many objects which came from Netif.* are alive. The server Out_of_memory with few packets:

>>>> remaining packets (009) <<<<
>>> 824286: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 abc6 4000 4006 0bb4 c0a8 0102 c0a8  .(..@.@.........
            00000020: 0103 0019 8c13 5eac 6ba4 8fb2 65e0 5010  ......^.k...e.P.
            00000030: 00a5 dfc9 0000                           .......
>>> 824310: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
            00000020: 0103 0019 8c13 5eac 6ba4 8fb2 65e0 5010  ......^.k...e.P.
            00000030: 00a5 dfc9 0000                           .......
>>> 824278: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0040 e4da 4000 4006 d287 c0a8 0102 c0a8  .@..@.@.........
            00000020: 0103 0019 8c12 6e06 23f0 fbab d84b 5019  ......n.#....KP.
            00000030: 00a5 8727 0000 1703 0300 1300 d566 3c95  ...'.........f<.
            00000040: 5860 5590 3f4f 48bb 99e5 0286 a111       X`U.?OH........
>>> 824285: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0040 abc5 4000 4006 0b9d c0a8 0102 c0a8  .@..@.@.........
            00000020: 0103 0019 8c13 5eac 6b8b 8fb2 65df 5019  ......^.k...e.P.
            00000030: 00a5 7b23 0000 1703 0300 133e edc8 a666  ..{#.......>...f
            00000040: 11b9 2283 cd54 1d03 567e fe31 2fea       .."..T..V~.1/..
>>> 824274: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 6fae 4000 4006 47cc c0a8 0102 c0a8  .(o.@.@.G.......
            00000020: 0103 0019 8c10 3b66 d517 aca5 2920 5010  ......;f....) P.
            00000030: 00a5 b96c 0000                           ...l...
>>> 824304: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
            00000020: 0103 0019 8c10 3b66 d517 aca5 2920 5010  ......;f....) P.
            00000030: 00a5 b96c 0000                           ...l...
>>> 824279: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 e4db 4000 4006 d29e c0a8 0102 c0a8  .(..@.@.........
            00000020: 0103 0019 8c12 6e06 2409 fbab d84c 5010  ......n.$....LP.
            00000030: 00a5 39a6 0000                           ..9....
>>> 824273: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0040 6fad 4000 4006 47b5 c0a8 0102 c0a8  .@o.@.@.G.......
            00000020: 0103 0019 8c10 3b66 d4fe aca5 291f 5019  ......;f....).P.
            00000030: 00a5 ae51 0000 1703 0300 13dd 3d77 32c7  ...Q........=w2.
            00000040: e48b fe75 3bef aa5f 73bf eed1 4113       ...u;.._s...A..
>>> 824307: 00000000: a201 9ed0 ccd7 8e35 4374 8166 0800 4500  .......5Ct.f..E.
            00000010: 0028 0000 4000 4006 b77a c0a8 0102 c0a8  .(..@.@..z......
            00000020: 0103 0019 8c12 6e06 2409 fbab d84c 5010  ......n.$....LP.
            00000030: 00a5 39a6 0000                           ..9....

Some is about ACK, some about [FIN, PSH, ACK] [TCP segment of a reassembled PDU] and all of them come from mirage-tcpip (the server) to transmit something to the client. Note that even if this packet seems small, Netif allocates a buffer of MTU bytes (in my case, 1500 bytes. 9 remaining packets is 13,5 Kbytes. It's a shame to not have a vertical scale on these outputs but it seems clear that even if I applied my previous patch, we still have a memory leak.

On the application level, I can assert that every Lwt.async used finish correctly and memory are released. At this stage, I don't really know what I need to do.

EDIT: I generated few graph like below for my module and none of them leak a memory like Netif.
EDIT: If you want to introspect a bit what is going on, I can keep an access to the memtrace output here: http://145.40.96.7:8080/ (1 days)
EDIT: When I look about Hxd module which show remaining packets, something strange appear:
capture004

That mostly mean that at one point, Netif.* starts to really leak in the middle of the server life.

@balrajsingh
Copy link
Member

Wow! This is an extremely well done investigation.  Much appreciation to you @dinosaure for your tenaciousness.

It's been a long time since I saw this code so please pardon my slowness.  My first thought was that the RST path had been very under-tested so it's not surprising that there are problems.

There may be more than one leak here but the one you identified when multiple valid RST segments are received, is definitely a problem.  The suggested fix of checking if the rx_data mvar is empty before putting it will probably not always work.  It will only work if the RST segments are timed so that the second and later ones are prevented from causing the mvar put because the first has not yet been taken. If the RST segment arrivals are timed so the mvar put from the first RST gets processed, which means that the thread that takes the mvar terminates, and then after that if a valid RST arrives causing a put, the mvar would never get read and it would be a leak.

I think the fix should involve the state machine where once one valid RST segment is accepted no further RSTs should be accepted.  Would it be possible for you to test the following changes.  There are two changes below, could you try them separately too? Unfortunately I don't have a functioning environment for mirage right now.

diff --git a/src/tcp/segment.ml b/src/tcp/segment.ml
index 0394c5fa..d685ed3e 100644
--- a/src/tcp/segment.ml
+++ b/src/tcp/segment.ml
@@ -110,12 +110,17 @@ module Rx(Time:Mirage_time.S) = struct

   let check_valid_segment q seg =
     if seg.header.rst then
-      if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
-        `Reset
-      else if Window.valid q.wnd seg.header.sequence then
-        `ChallengeAck
-      else
-        `Drop
+      begin match State.state q.state with
+        | State.Reset ->
+          `Drop
+        | _ ->
+          if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
+            `Reset
+          else if Window.valid q.wnd seg.header.sequence then
+            `ChallengeAck
+          else
+            `Drop
+      end
     else if seg.header.syn then
       `ChallengeAck
     else if Window.valid q.wnd seg.header.sequence then
@@ -131,7 +136,9 @@ module Rx(Time:Mirage_time.S) = struct
   let send_challenge_ack q =
     (* TODO:  rfc5961 ACK Throttling *)
     (* Is this the correct way trigger an ack? *)
-    Lwt_mvar.put q.rx_data (Some [], Some Sequence.zero)
+    if Lwt_mvar.is_empty q.rx_data
+      then Lwt_mvar.put q.rx_data (Some [], Some Sequence.zero)
+      else Lwt.return_unit

   (* Given an input segment, the window information, and a receive
      queue, update the window, extract any ready segments into the

@dinosaure
Copy link
Member Author

Hi @balrajsingh,

Firstly, thank you for your time to help us to investigate this memory leak. It's not easy to go deeper in this codebase. From what you explain, I think you highlight the reason about why, even with my initial patch, I still continue to have the memory leak.

Then, I relaunched my huge test (which is: send ONE MILLION emails to the server) with your patch and it's seems that we don't have any memory leaks 🎉 ! I get an other problem (which is more about the time spent to try a connection which is long and put a pressure on the GC which is not allowed to release resource until we fail) but it's no more about mirage-crypto/mirage-tcpip 👍 .

So really thank you to help us about that because it unlock us about sustainability of many of our services (such as the SMTP service). Hope that this patch will solve the issue discovered by @hannesm before too. I will definitely propose a PR to integrate your patch (with you as an author) and cut a release. WDYT?

And again, big thanks!

@balrajsingh
Copy link
Member

Hooray! That is most excellent @dinosaure. It's really great to hear that it worked. Yes please do propose a PR, I feel quite comfortable with fixing the multiple valid RST problem in this way.

About memory use when lots of connections are being tried, and re-tried when some fail due to packet loss, that is just the way that this design works. It does try to delay creating all the required threads and data structures till a connection is almost ready to be in the Established state, but still the minimum that it allocates for each connection as it attempts to reach the Established state will add up. We may need some clever design improvement to somehow not use any memory at all till the connection is fully ready to be Established. At the same time, it could just be considered one of the capacity limits for this stack - as long as the limit is high enough it is probably ok.

@dinosaure
Copy link
Member Author

Thanks everybody for this fix 👍!

hannesm added a commit to hannesm/opam-repository that referenced this issue Dec 10, 2021
CHANGES:

* Fix memory leak in processing RST packets (mirage/mirage-tcpip#460 @balrajsingh, reported in
  mirage/mirage-tcpip#456 by @dinosaure)
* Move module types (IP, UDP, TCP, STACK, ICMP) into tcpip core library
  (mirage/mirage-tcpip#463 @hannesm)
* API breakage: Tcpip_checksum is now part of tcpip.checksum (used to be
  part of tcpip mirage/mirage-tcpip#463 @hannesm)
* API breakage: tcpip.unix has been removed (mirage/mirage-tcpip#463 @hannesm)
* Use Lwt.pause instead of deprecated Lwt_{unix,main}.yield (mirage/mirage-tcpip#461 @dinosaure)
hannesm added a commit to hannesm/opam-repository that referenced this issue Dec 14, 2021
CHANGES:

* Fix memory leak in processing RST packets (mirage/mirage-tcpip#460 @balrajsingh, reported in
  mirage/mirage-tcpip#456 by @dinosaure)
* Move module types (IP, UDP, TCP, STACK, ICMP) into tcpip core library
  (mirage/mirage-tcpip#463 @hannesm)
* API breakage: Tcpip_checksum is now part of tcpip.checksum (used to be
  part of tcpip mirage/mirage-tcpip#463 @hannesm)
* API breakage: tcpip.unix has been removed (mirage/mirage-tcpip#463 @hannesm)
* Use Lwt.pause instead of deprecated Lwt_{unix,main}.yield (mirage/mirage-tcpip#461 @dinosaure)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants