-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance drop with increasing buffer size #949
Comments
this can happen if your network device can queue a maximum amount of frames in the rx or tx queues, and drops excess frames instead of buffering them or telling smoltcp to slown down on tx (returning Exhausted). smoltcp (or the remote side) sends a bunch of frames because it sees the window is big, which then overflows the queue, causing packets to get lost. There's a workaround which is max_burst_size, which caps the amount of in-flight tcp data at a given time. |
I think originally this packet loss was the problem, but now that I increased the buffer size there is no packet loss but I still see this slowness. I use libpcap for injecting packets, and originally I had Now I have two questions: First, how I can tell smoltcp to slow down? Device transmit only returns Second, how OS tcp stack handles that problem? I tried in a mininet with 3 nodes, in this topology:
On node2 there is a small libpcap program that bridges two interfaces with a small amount of buffer ~200 packets, links have 100ms latency, and OS tcp socket is able to reach near 100Mbps speed within seconds but there is no mechanism that notifies OS in node1 that the link capacity is full. Third, I previously had |
yes. if the phy can't transmit right now, return None. Later, when it's ready to transmit again, poll the interface again.
This is done with "congestion control". If node1 sees packets are getting lost, it assumes it's because it exceeded the capacity of some link in the path, and slows down. Actually, this is something you could try that might help with slowness. The latest release 0.11 didn't have any congestion control at all, but it's been added recently: #907. Maybe try using smoltcp from git with congestion control enabled, see if it helps. The "max burst size" thing is actually kind of a hack to workaround lack of congestion control, but onyl takes into account the local buffer queue's size. |
I tried enabling the congestion control, with no success. The server at My code is available here. I created a |
I found out that 1Gbit and 100Mbit were too huge, my system is able to write on I captured the traffic on both sides, and noticed that wireshark marks many packets red with labels |
I investigated a bit and (one part of) the problem seems to be here: Lines 338 to 355 in 7b125ef
If the timer is already in the If I change the code above to: match *self {
Timer::Idle { .. } | Timer::FastRetransmit { .. } | Timer::Retransmit { .. } => {
*self = Timer::Retransmit {
expires_at: timestamp + delay,
delay,
}
}
Timer::Close { .. } => (),
} It will solve the problem and the code becomes able to use all of the bandwidth of a 1Mbit/s link. I'm not sure this is the right thing to do, but I think this part of code needs some action. |
I think I also experience this bug. |
I have a smoltcp device which wants to connect to targets with various latency. To increase throughput I use big buffer sizes, but the problem is that there is an "ideal" buffer size which below it throughput scales linearly, it becomes max around the ideal point, and harshly drops if you increase buffer size more than the ideal value. And since targets have different latency amounts, there is no one size fit all buffer size. My intuition was that increasing buffer size only helps smoltcp and should not affect anything if the connection can't reach the big window sizes, so by increasing the buffer size we should always get better throughput (at cost of more memory usage) but it is not the case in my experiments.
I don't have a small repro at the moment but can try to make one if this behavior is not natural.
The text was updated successfully, but these errors were encountered: