Luke Hsiao and Jervis Muindi
June 2017
- Tries to maximize throughput and minimize latency
- Does so by estimating the bottleneck bandwidth and round-trip propagation delay
- Deployed in Google's B4 datacenter-to-datacenter high-speed WAN which uses commodity switches.
---?image= 90%
- We focus on the claim that BBR is better than CUBIC in networks with non-negligible loss rates.
- Illustrates one of the most obvious differences between a loss-based congestion control algorithm and a congestion-based algorithm.
- CUBIC's loss tolerance is a property of the algorithm. BBR's loss tolerance is a configuration parameter.
- 2-25x improvement in throughput on Google's B4.
BBR (green) vs. CUBIC (red) throughput for 60-second flows on a 100Mbps/100-ms link with 0.001% to 50% random loss.
NOTE: As BBR's loss rate approaches the ProbeBW peak gain, the probability of measuring a delivery rate of the true BtlBw drops sharply, causing the max filter to underestimate.
- Can we replicate Figure 8?
- How does Figure 8 look with other congestion control algorithms?
- How does bottleneck bandwidth or RTT affect the comparison?
- How does BBR compare with CUBIC on a cellular link?
- Ubuntu 16.04 LTS VM with v4.11.1 of the Linux kernel
- Mahimahi Network Emulator
- Infinite buffer on bottleneck link
- Google Cloud
instance - 6.25MB maximum send and receive window sizes
- Python client process sending stream of data to a python server process over emulated link.
NOTE: As BBR's loss rate approaches the ProbeBW peak gain, the probability of measuring a delivery rate of the true BtlBw drops sharply, causing the max filter to underestimate.
- CUBIC achieves slightly better throughput than BBR for extremely low loss rates.
- BBR throughput does not drop until about 45% loss unlike the original paper.
- This is likely due to a difference in the implementation of the loss
process in Mahimahi vs. the
-based emulation used by the authors.
- This is likely due to a difference in the implementation of the loss
process in Mahimahi vs. the
Note: Specifically, there are two possible factors: (1) the size of the initial congestion window can result in the current BBR code pacing packets one RTT later than CUBIC would (discussed in this developer thread), and (2) the current implementation of the ProbeRTT mechanism prioritized simplicity over performance, which can result in ~2% penalty in throughput because BBR spends those portions of time with a minimal number of packets in flight. Differences in emulation: level of correlation of drops, or number of packets dropped at the same time (LRO/GRO settings).
NOTE: BIC designed for "long fat networks" while Westwood was designed for high BDP with potential packet loss. Evaluated on 30-second flows.
How does BBR compare against CUBIC for different bottleneck bandwidths and different RTT values?
NOTE: Performed on 30-second flows. Some of the variance for the low-throughput lines could be address by using smaller maximum buffer sizes.
The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm. BW < (MSS/RTT) * 1 / sqrt(p)
NOTE: Performed on 120-second flows. The decreasing BWs are mainly a result of the longer time spent in start up. Longer flow simulations could address this.
All other experiments had fixed bandwidths and delays. How does BBR compare to CUBIC when these values change over time?
Experiment using the 140-second Verizon LTE trace provided with Mahimahi.
- Default send and receive buffer maximums in Ubuntu can skew results.
- Initially, deadlock could occur between our Python client and server processes.
- Our experiments support the claim of the original paper: in general, BBR performs better than CUBIC for non-negligible loss rates.
- We find that this behavior holds true across varied bandwidths and RTTs
- Reproduce our results using our GitHub repository.
- It takes about 8.5 hours to run all experiments.