Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop with increasing buffer size #949

Open
HKalbasi opened this issue Jul 8, 2024 · 8 comments
Open

Performance drop with increasing buffer size #949

HKalbasi opened this issue Jul 8, 2024 · 8 comments

Comments

@HKalbasi
Copy link
Contributor

HKalbasi commented Jul 8, 2024

I have a smoltcp device which wants to connect to targets with various latency. To increase throughput I use big buffer sizes, but the problem is that there is an "ideal" buffer size which below it throughput scales linearly, it becomes max around the ideal point, and harshly drops if you increase buffer size more than the ideal value. And since targets have different latency amounts, there is no one size fit all buffer size. My intuition was that increasing buffer size only helps smoltcp and should not affect anything if the connection can't reach the big window sizes, so by increasing the buffer size we should always get better throughput (at cost of more memory usage) but it is not the case in my experiments.

I don't have a small repro at the moment but can try to make one if this behavior is not natural.

@Dirbaio
Copy link
Member

Dirbaio commented Jul 9, 2024

this can happen if your network device can queue a maximum amount of frames in the rx or tx queues, and drops excess frames instead of buffering them or telling smoltcp to slown down on tx (returning Exhausted). smoltcp (or the remote side) sends a bunch of frames because it sees the window is big, which then overflows the queue, causing packets to get lost.

There's a workaround which is max_burst_size, which caps the amount of in-flight tcp data at a given time.

@HKalbasi
Copy link
Contributor Author

HKalbasi commented Jul 9, 2024

I think originally this packet loss was the problem, but now that I increased the buffer size there is no packet loss but I still see this slowness. I use libpcap for injecting packets, and originally I had Resource temporarily unavailable error from the libpcap device. Then I increased the SO_SNDBUF of the socket, and now it seems libpcap is happy with my packets, maybe some upstream device drops them now, I can try to match packets on target to see if there is some packet loss.

Now I have two questions:

First, how I can tell smoltcp to slow down? Device transmit only returns Option. Does returning None from it suffice?

Second, how OS tcp stack handles that problem? I tried in a mininet with 3 nodes, in this topology:

node1 ----------------|1Gbps link|------------------------- node 2 ------------------|100Mbps link|---------------- node 3

On node2 there is a small libpcap program that bridges two interfaces with a small amount of buffer ~200 packets, links have 100ms latency, and OS tcp socket is able to reach near 100Mbps speed within seconds but there is no mechanism that notifies OS in node1 that the link capacity is full.

Third, I previously had max_burst_size = Some(1) in my device capabilities, which probably I just copied from some example. But it seems it has no effect since if it was limiting on the fly packets to one, changing the buffer size should not change anything, which is not the case. My device is Medium::Ip if it is relevant. What should I set it if my link has e.g. 100Mbit/s capacity?

@Dirbaio
Copy link
Member

Dirbaio commented Jul 9, 2024

First, how I can tell smoltcp to slow down? Device transmit only returns Option. Does returning None from it suffice?

yes. if the phy can't transmit right now, return None. Later, when it's ready to transmit again, poll the interface again.

OS tcp socket is able to reach near 100Mbps speed within seconds but there is no mechanism that notifies OS in node1 that the link capacity is full.

This is done with "congestion control". If node1 sees packets are getting lost, it assumes it's because it exceeded the capacity of some link in the path, and slows down.

Actually, this is something you could try that might help with slowness. The latest release 0.11 didn't have any congestion control at all, but it's been added recently: #907. Maybe try using smoltcp from git with congestion control enabled, see if it helps.

The "max burst size" thing is actually kind of a hack to workaround lack of congestion control, but onyl takes into account the local buffer queue's size.

@HKalbasi
Copy link
Contributor Author

HKalbasi commented Jul 10, 2024

I tried enabling the congestion control, with no success.
Screencast from 07-10-2024 01:28:23 PM.webm

The server at 10.0.0.1 is a yes | nc -l 0.0.0.0 8000 & which uses the OS tcp stack and the one in 10.0.0.5 is smoltcp listening on a raw socket.

My code is available here. I created a phy::RawSocket device, used 65535000 as buffer size, removed the panic at smoltcp/src/phy/raw_socket.rs:133:25 since it hits No buffer space available (os error 105) and replaced it with (), and used tcp1_socket.set_congestion_control(CongestionControl::Cubic) to set the congestion control. I tested various bandwidths and delays, and in all of them smoltcp is suboptimal.

@HKalbasi
Copy link
Contributor Author

I found out that 1Gbit and 100Mbit were too huge, my system is able to write on AF_PACKET sockets at 72Mbit/s rate. So I changed the rates to 10Mbit/s and 1Mbit/s. Still os tcp socket is able to use a constant 920Kbit/s rate, but smoltcp's rate fluctuates and becomes zero sometimes like the video I sent above.

I captured the traffic on both sides, and noticed that wireshark marks many packets red with labels Spurious Retransmission and Out of Order in the smoltcp capture but is happy with the os capture.

@HKalbasi
Copy link
Contributor Author

I investigated a bit and (one part of) the problem seems to be here:

smoltcp/src/socket/tcp.rs

Lines 338 to 355 in 7b125ef

fn set_for_retransmit(&mut self, timestamp: Instant, delay: Duration) {
match *self {
Timer::Idle { .. } | Timer::FastRetransmit { .. } => {
*self = Timer::Retransmit {
expires_at: timestamp + delay,
delay,
}
}
Timer::Retransmit { expires_at, delay } if timestamp >= expires_at => {
*self = Timer::Retransmit {
expires_at: timestamp + delay,
delay: delay * 2,
}
}
Timer::Retransmit { .. } => (),
Timer::Close { .. } => (),
}
}

If the timer is already in the Retransmission state, and it is not expired, it won't update and so there will be an inevitable retransmission no matter how many acks are already received, and wireshark will mark it as a Spurious Retransmission since its acks are already received.

If I change the code above to:

match *self {
    Timer::Idle { .. } | Timer::FastRetransmit { .. } | Timer::Retransmit { .. } => {
        *self = Timer::Retransmit {
            expires_at: timestamp + delay,
            delay,
        }
    }
    Timer::Close { .. } => (),
}

It will solve the problem and the code becomes able to use all of the bandwidth of a 1Mbit/s link. I'm not sure this is the right thing to do, but I think this part of code needs some action.

@tomDev5
Copy link
Contributor

tomDev5 commented Dec 22, 2024

I think I also experience this bug.
@HKalbasi , In your case, what buffer size starts to be an issue?
(In my case 8k works well but 64k doesn't)

@HKalbasi
Copy link
Contributor Author

I don't recall exactly but we needed huge buffer sizes (in order of megabytes) to reach the link level capacities. It had a chart like this:
image

I would expect the exact numbers depend heavily on the network conditions and link properties (latency, bandwidth, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants