Skip to content

Commit

Permalink
tcp: refine tcp_prune_ofo_queue() to not drop all packets
Browse files Browse the repository at this point in the history
Over the years, TCP BDP has increased a lot, and is typically
in the order of ~10 Mbytes with help of clever Congestion Control
modules.

In presence of packet losses, TCP stores incoming packets into an out of
order queue, and number of skbs sitting there waiting for the missing
packets to be received can match the BDP (~10 Mbytes)

In some cases, TCP needs to make room for incoming skbs, and current
strategy can simply remove all skbs in the out of order queue as a last
resort, incurring a huge penalty, both for receiver and sender.

Unfortunately these 'last resort events' are quite frequent, forcing
sender to send all packets again, stalling the flow and wasting a lot of
resources.

This patch cleans only a part of the out of order queue in order
to meet the memory constraints.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: C. Stephen Gun <csg@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Eric Dumazet authored and davem330 committed Aug 19, 2016
1 parent e2d8f64 commit 36a6503
Showing 1 changed file with 28 additions and 19 deletions.
47 changes: 28 additions & 19 deletions net/ipv4/tcp_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -4392,12 +4392,9 @@ static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb,
if (tcp_prune_queue(sk) < 0)
return -1;

if (!sk_rmem_schedule(sk, skb, size)) {
while (!sk_rmem_schedule(sk, skb, size)) {
if (!tcp_prune_ofo_queue(sk))
return -1;

if (!sk_rmem_schedule(sk, skb, size))
return -1;
}
}
return 0;
Expand Down Expand Up @@ -4874,29 +4871,41 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
}

/*
* Purge the out-of-order queue.
* Return true if queue was pruned.
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
*
* Return true if queue has shrunk.
*/
static bool tcp_prune_ofo_queue(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
bool res = false;
struct sk_buff *skb;

if (!skb_queue_empty(&tp->out_of_order_queue)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
__skb_queue_purge(&tp->out_of_order_queue);
if (skb_queue_empty(&tp->out_of_order_queue))
return false;

/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
* is in a sad state like this, we care only about integrity
* of the connection not performance.
*/
if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt);
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);

while ((skb = __skb_dequeue_tail(&tp->out_of_order_queue)) != NULL) {
tcp_drop(sk, skb);
sk_mem_reclaim(sk);
res = true;
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
break;
}
return res;

/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
* is in a sad state like this, we care only about integrity
* of the connection not performance.
*/
if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt);
return true;
}

/* Reduce allocated memory if we can, trying to get
Expand Down

0 comments on commit 36a6503

Please sign in to comment.