-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: explain the throttling methodology
- outline the current methodology; - explain its theoretical assumptions; - describe limitations; - mention possible future directions. Part of ooni/ooni.org#1406.
- Loading branch information
1 parent
d08ce02
commit a4cb94f
Showing
1 changed file
with
195 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
# Throttling measurement methodology | ||
|
||
| | | | ||
|--------------|------------------------------------------------| | ||
| Author | [@bassosimone](https://github.com/bassosimone) | | ||
| Last-Updated | 2024-04-09 | | ||
| Reviewed-by | N/A | | ||
| Status | ready for review | | ||
|
||
This document explains the throttling measurement methodology implemented inside | ||
the [ooni/probe-cli](https://github.com/ooni/probe-cli) repository. | ||
|
||
We are publishing this document as part of this repository for discussion. A future | ||
version of this document may be moved into the [ooni/spec](https://github.com/ooni/spec) | ||
repository. | ||
|
||
## Problem statement | ||
|
||
We are interested to detect cases of _extreme throttling_. We say that throttling is | ||
_extreme_ when the speed to access web resources is _significantly reduced_ (10x or more) | ||
compared to what is _typically_ observed. We care about extreme throttling because we | ||
are interested in cases in which the performance impact is such to make the website | ||
_unlikely_ to work as intended for web users in a country. | ||
|
||
We, and other researchers, have documented such issues in the past. See, for example: | ||
|
||
1. [our blog post documenting twitter throttling in Russia]( | ||
https://ooni.org/post/2022-russia-blocks-amid-ru-ua-conflict/), which is the | ||
first instance in which we tested this methodology. | ||
|
||
2. ["Throttling Twitter: an emerging censorship technique in Russia" by Xue et al.]( | ||
https://censorbib.nymity.ch/#Xue2021a). | ||
|
||
OONI Probe measures websites as part of the [Web Connectivity experiment]( | ||
https://github.com/ooni/spec/blob/master/nettests/ts-017-web-connectivity.md) and | ||
these measurements contain peformance metrics. | ||
|
||
The next section explains which performance metrics we collect and how these can | ||
be useful to document episodes of extreme throttling. | ||
|
||
## Methodology | ||
|
||
The overall idea of our methodology is that we're not concerned with _how_ throttling | ||
is implemented, rather we aim to show clearly degraded network performance. | ||
|
||
We aim to detect such a degradation by comparing metrics collected by OONI Probe instances | ||
running in a country and network with measurements previously collected by users and/or with | ||
concurrent measurements towards different targets. | ||
|
||
### Network Events | ||
|
||
Web Connectivity v0.5 collects the first 64 [network events]( | ||
https://github.com/ooni/spec/blob/master/data-formats/df-008-netevents.md). These events | ||
include "read" and "write" events, which map directly to network I/O operations. The basic | ||
structure of a "read" network events is the following: | ||
|
||
```JSON | ||
{ | ||
"address": "1.1.1.1:443", | ||
"failure": null, | ||
"num_bytes": 4114, | ||
"operation": "read", | ||
"proto": "tcp", | ||
"t0": 1.001, | ||
"t": 1.174, | ||
"tags": [], | ||
"transaction_id": 1, | ||
} | ||
``` | ||
|
||
Through these events, we know when "read" returned (`t`), for how much time it was blocked | ||
(`t - t0`), and the number of bytes sent or received. | ||
|
||
The slope of the integral of "read" events, threfore, provides information about the speed | ||
at which we were receiving data from the network. Slow downs in the stream either correspond | ||
to reordering and retransmission events (where there is head of line blocking) or to | ||
timeout events (where the TCP pipe is empty). | ||
|
||
Additionally, network events contain events such as `"tls_handshake_start"` and | ||
`"tls_handshake_done`", which look like the following: | ||
|
||
```JSON | ||
{ | ||
"address": "1.1.1.1:443", | ||
"failure": null, | ||
"num_bytes": 0, | ||
"operation": "tls_handshake_start", | ||
"proto": "tcp", | ||
"t0": 1.001, | ||
"t": 1.001, | ||
"tags": [], | ||
"transaction_id": 1, | ||
} | ||
``` | ||
|
||
These events allow us to know when we started and we stopped handshaking. | ||
|
||
Now, considering that the amount of bytes transferred by a TLS handshake with the | ||
same server using similar client code is not far from being constant (i.e., it's a relatively | ||
narrow gaussian with small sigma), we can model the problem of TLS handshaking as | ||
the problem of downloading a ~fixed amount of data. | ||
|
||
With many users measuring popular websites using OONI Probe in a given country | ||
and network, we can therefore establish comparisons of current performance metrics with | ||
previous performance metrics. In case of extreme throttling, where the download speed | ||
is reduced of, at least 10x, we would notice a performance difference. The _time_ | ||
required to complete the TLS handshake should be a sufficient metric (and, in fact, | ||
_is_ a performance metric used by speed tests such as | ||
[speed.cloudflare.com](https://speed.cloudflare.com/)). | ||
|
||
### Download speed metrics | ||
|
||
Additionally, Web Connectivity v0.5 collects download speed samples for connections | ||
used to access websites. We use the same methodology used by [ndt7]( | ||
https://github.com/m-lab/ndt-server/blob/main/spec/ndt7-protocol.md). We measure | ||
the cumulative number of bytes received by a connection using a truncated exponential | ||
distribution to decide when to collect samples. By not collecting samples at fixed | ||
intervals, we [should have PASTA properties](https://en.wikipedia.org/wiki/Arrival_theorem#Theorem_for_arrivals_governed_by_a_Poisson_process). | ||
|
||
The total TLS handshaking, HTTP round trip and body fetching time is bounded by a fixed amount of | ||
seconds (currently ten seconds for the handshake and ten additional seconds for HTTP). Additionally, | ||
there is a cap on the maximum amount of body bytes to download (`1<<19`). | ||
|
||
The expected size of a downloaded webpage should be pretty constant for clients | ||
attempting to fetch such a webpage from the same country and network. Therefore, we | ||
can model handshaking plus fetching the body as asking the question of how much | ||
time it takes to download `handshake_size + min(body_size, 1<<19)` bytes in ~twenty seconds. | ||
|
||
If we assume that the server is not going to throttle downloads (which is still | ||
an hypothesis worth considering), save for some (healthy) packet pacing, significant | ||
changes in the _time_ required to perform the whole set of operations would be | ||
an indication of extreme throttling. However, in using time as the metric, or any | ||
other metric, we need to remember to classify measurements that time out (i.e., are | ||
not able to fetch the whole body) apart from the ones that complete successfully. | ||
|
||
## Discussion | ||
|
||
This methodology leverages existing performance metrics inside of Web Connectivity | ||
v0.5 to passively detect extreme throttling. Because this methodology models | ||
the TLS handshake and fetching the body as speed tests, it is, however, not possible | ||
to provide users with clear indication of throttling after a single run. We will, | ||
instead, need to collect several samples over time and cross compare them using | ||
the [ooni/data](https://github.com/ooni/data) measurement pipeline. | ||
|
||
Throttling could be caused by policers and shapers as well as by forcing specific | ||
users to pass through a congested path. When policers and shapers are used, we | ||
expect the speed to likely converge to predictable values (e.g., 128 kbit/s). On the | ||
contrary, when throttling is driven by congestion, we expect to see higher variance | ||
in the results, possibly correlated with daily usage patterns. | ||
|
||
## Digital Divide Implications | ||
|
||
By collecting passive performance metrics, we are not only equipped to detect | ||
extreme throttling, but we are also gathering information about the performance | ||
achievable by clients in several world regions for reaching specific websites. The | ||
availability of HTTP headers and the practice of some CDNs of annotating the | ||
responses with headers indicating which specific cache is being used could also | ||
be exploited to make interesting network-performance statements. | ||
|
||
## Future Work | ||
|
||
With network events, we can also collect some ~baseline RTT samples. The `t - t0` time | ||
of the TCP connect event provides an upper bound of the path RTT _unless_ there is a | ||
retransmission of the `SYN` segment. The TLS handshake also involves sending TCP segments | ||
back and forth in such a fashion that it's possible to extract RTT metrics. Howewer, we | ||
should be careful to exclude segments sent back to back. | ||
|
||
In general, detecting more precisely the characteristics of throttling either | ||
requires additional research aimed at classifying the stream of events emitted | ||
by a receiving socket under specific throttling conditions. A possible starting | ||
point for this research could be [Strengthening measurements from the edges: | ||
application-level packet loss rate estimation by Basso et al.]( | ||
https://www.sigcomm.org/sites/default/files/ccr/papers/2013/July/2500098-2500104.pdf). | ||
|
||
An alternative approach would require the possibility of providing OONI experiments | ||
with "richer input" parameters or dynamic experiments, aimed at answering more | ||
specific research questions. For example, if there are reports that a website is | ||
throttled by SNI, we could perform a download from a given test server with | ||
certificate verification disabled, using the offending SNI and an innocuous SNI. | ||
|
||
Because HTTP/3 used QUIC and because QUIC operates in userspace, there is | ||
also the possibility of instrumenting the QUIC library to periodically collect | ||
snapshots about the receiver's state. However, in general, sender stats are | ||
much more useful to understand QUIC performance. This fact implies that we could | ||
instrument a QUIC library to observe the sender's state and gather information | ||
about throttling uploads. (However, the whole design of Web Connectivity is not | ||
such that we upload resources, therefore we would need to figure out whether | ||
it is possible to overcome this fundamental limitation first.) | ||
|
||
In the same vein, our Web Connectivity methodology does not currently factor in | ||
the possibility of measuring upload speed throttling for HTTP/1.1 and HTTP2. However, | ||
anecdotal evidence exists that some countries may throttle the upload path or just | ||
have poor upstream connectivity towards interesting websites. A technique that | ||
has sometimes been applied is that of including very large headers into the request | ||
body, even though servers may not necessarily accept such headers. |