Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websteps vs webconnectivity comparison #1797

Closed
bassosimone opened this issue Oct 5, 2021 · 5 comments
Closed

websteps vs webconnectivity comparison #1797

bassosimone opened this issue Oct 5, 2021 · 5 comments

Comments

@bassosimone
Copy link
Contributor

We want to perform a comparison of websteps and webconnectivity across these dimensions:

  1. size of the JSON produced for each URL;
  2. bytes consumed per URL;
  3. runtime per URL (accounting for DNS, test helper, TCP connect, TLS handshake, fetching time).

We are not focusing on the ability to precisely flag known censorship cases, for now, because we need to design an algorithm for flagging failures for websteps first. Though, if we collect a JSONL for websteps, we can simulate such an algorithm and we actually have a way to perform also a comparison of precision. But, it seems better to put this specific kind of comparison into another issue, otherwise also this issue here will become huge and long running.

@bassosimone bassosimone self-assigned this Oct 5, 2021
@bassosimone bassosimone added this to the Sprint 49 - Humpback whale milestone Oct 5, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 5, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 5, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 5, 2021
@bassosimone
Copy link
Contributor Author

Instrumenting OONI to add bandwidth monitoring

To perform this task I used the bwmon branch. This branch contains diffs that optionally emit bandwidth usage snapshots onto the file specified using miniooni's --bwmon command line flag.

The bandwidth snapshots look like this:

{
  "Timestamp": "2021-10-06T16:17:09.537899811Z",
  "Elapsed": 5009818636,
  "Read": 0,
  "ReadFrom": 4435,
  "Write": 283,
  "WriteTo": 8057
}

Where Read and Write refer to TCP, ReadFrom and WriteTo refer to QUIC.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 8, 2021

Experimental setup

I sketched out Jafar2, which adds support for bandwidth throttling. There is a branch called jafar2 that contains some experimental code for doing that. The initial design is this:

experimental-setup

I got feedback from @FedericoCeratto and @hellais on how this design could be simplified and improved.

This setup allowed me to choose netem and tbf parameters to simulate different network conditions that include extra delay, extra losses, and shaping of the incoming/outgoing traffic.

I calibrated this setup by verifying that I could introduce extra delays, losses, and shaping independently.

(I'll open a separate issue explaining how this design could be improved.)

Regarding TBF parameters, I used this rule of thumb:

  • rate is configured to the desired drain rate (e.g., 7 Mbit/s to emulate a 7 Mbit/s ADSL);

  • burst is the amount of packets that the TBF is allowed to emit back-to-back. This value has always been configured with a number of bytes within 8 and 16 MTUs;

  • limit is the amount of buffering, which I configured to be equal to the number of bytes that you could drain in around one second at the given rate (which may be more than one second given the latency).

With these rules, I emulated the following networks:

  • 3G
{
  "Download": {
    "Netem": "delay 50ms",
    "TBF": "rate 700kbit burst 12kb limit 88kb"
  },
  "Upload": {
    "Netem": "delay 50ms",
    "TBF": "rate 700kbit burst 12kb limit 88kb"
  }
}
  • ADSL:
{
  "Download": {
    "Netem": "delay 15ms",
    "TBF": "rate 7000kbit burst 12kb limit 875kb"
  },
  "Upload": {
    "Netem": "delay 15ms",
    "TBF": "rate 700kbit burst 12kb limit 88kb"
  }
}
  • EDGE:
{
  "Download": {
    "Netem": "delay 400ms loss 25% 25%",
    "TBF": "rate 100kbit burst 12kb limit 20kb"
  },
  "Upload": {
    "Netem": "delay 400ms loss 25% 25%",
    "TBF": "rate 100kbit burst 12kb limit 20kb"
  }
}

where for EDGE I also varied the extra losses from 0 to 0.1%, 10%, and 25%. I always used correlation 25% for losses to simulate the case where losses are correlated.

(I later learned that one could implement shaping directly with netem and that it's possible to ask netem to group packets together and simulate a fixed slot mechanism like happens for mobile networks.)

Each experiment consisted of a miniooni --random -n $nettestName where, of course, $nettestName was one of websteps and web_connectivity. I run experiments using the above Jafar2 architecture as well as experiments using just my home connection (100 Mbit/s download, 20 Mbit/s upload, ping to 8.8.8.8 equal to 30 ms).

Note that -n implies not submitting, so this experiment is not measuring the impact of submitting reports.

The --random flag shuffles the input randomly. I did not run all the experiments through completion, and often I specified --max-runtime 600 (i.e., ten minutes), so I wanted to randomly sample the test list.

I run three experiments from my home network, three emulating 3G, two emulating ADSL, and four emulating EDGE using various configurations for the packet losses.

@bassosimone
Copy link
Contributor Author

Trends

Not all runs were equal, but, with the exception of some outliers, I observed the following trends:

  1. webconnectivity uses more download bandwidth (and therefore downloads more data). This seems reasonable because webconnectivity fetches 1<<17 of the response body, while websteps fetches only 1<<11 of the body.

  2. websteps uses more upload bandwidth (and so sends more data). This also seems reasonable since websteps performs more TLS handshakes than webconnectivity and also performs QUIC handshakes.

  3. if we define measurement as "emits a JSON related to an URL", then websteps and webconnectivity emit measurements more or less at the same pace. Determining the pace in a very robust way is tricky because both slow down when there is a timeout as well as when the test helper does not respond immediately. Though the sense from all the experiments I run is that there seem generally to be comparable. Though, the pace at which they proceed changes if we only consider URLs inside the tests list. In such a case, webconnectivity is faster. This occurs because websteps follows every endpoint of every redirection and calls the test helper for that, while webconnectivity only calls the test helper once and attributes all redirections to the original URL. Therefore, if you measure http://example.com that redirects to https://www.example.xxx, webconnectivity will only have a single measurement for the former, while websteps will emit a measurement for each URL.

  4. websteps reports are not larger in bytes than webconnectivity reports. Websteps collects more network-level data and more DNS level data, but this is compensated by downloading a smaller body.

  5. the current implementation of websteps is slower in the DNS phase in presence of many timeouts (e.g., when running on EDGE with 25% of packet losses) because we're performing many DNS resolutions, some of which are not in parallel. On the contrary, webconnectivity is generally faster here because it only uses the system resolver. While there is value in performing extra DNS resolutions, we concluded with @hellais that we don't want to run many of them. Because the blocking hypothesis is that there may be DNS injection and this occurs regardless of the resolver we use, it seems sufficient to perform just a single resolution. Also, it may be worth it not querying for HTTPS, since few websites are using HTTPS queries (I will create a separate issue about this topic),

  6. websteps is faster when fetching endpoints in a single measurement. There is no clear definition of fetching endpoints for webconnectivity, so I choose to consider as the "time to fetch endpoints" the time to perform TCP connect and TLS handshake to all endpoints plus the time to do a single HTTP GET with redirects. This definition is a bit of a stretch and indeed here we're a bit comparing apples and oranges, though this form of breakdown contributes to explain why we see that a single URL measurement time is comparable between the two. Basically, while webconnectivity has an advantage in its DNS phase, with the test helper runtime being ~similar, we see that webconnectivity spends more time than websteps in the remainder of the measurement, because it needs to fetch whole bodies and follow redirects. (Since we know only the last body is generally large from test lists: investigate errors and redirections #1727, it stems that basically webconnectivity is spending lots of time fetching the final body.)

These observations led to the following conclusions:

  1. we should not worry about websteps using more bandwidth

  2. we may want to worry about websteps being a bit slower overall

  3. the report size is not a concern (i.e., we're not making it worst) but we may still want to make it better

  4. there are ways to make both tests faster

  5. it seems we should not spend more time focusing on their accuracy in selected test cases

  6. we should clarify that with websteps we are going to measure more URLs than before, some of which are not currently part of the test list but derive from test list URLs (e.g., http://gmail.com -> https://mail.google.com -> https://accounts.google.com)

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 8, 2021

Low-bandwidth environments

One concern when designing websteps has been that of ensuring it's reasonably good in a low bandwidth environment, where reasonably good is not super well defined, but broadly means that it's accurate and still able to work without saturating the bandwidth. There is room for improvement in evaluating OONI in such environments. Also, designing with this case in mind means that you are really using a fraction of the bandwidth in environments with more bandwidth. This seems an optimisation problem. It may be tempting to solve it, but we are also using background measurements a lot, so it may be futile to try and solve this problem. Either way, it's clear that we want to continue measuring and learning here.

Though, in the interest of having some reference data for the future, let me put here some charts with comments taken from a run in the environment that we called 3G above. (In fairness, our emulated environment was not a good emulation of 3G because it did not feature packet batching, but still we have data points in an environment with low bandwidth, which is a good stepping stone for taking initial decisions and running better simulations in the future.)

Bandwidth usage

The following chart shows the empirical CDF of the download bandwidth usage from the --bwmon snapshots.

bw_down

The following chart shows the empirical CDF of the upload bandwidth usage from the --bwmon snapshots.

bw_up

The nice part about websteps here is that in these 5s windows we are never very close to the maximum configure download and upload bandwidth (700 kbit/s in this experimental setup).

Downloading and uploading bytes over time

The following charts are another way of viewing the bandwith usage. This is downloaded data over time:

bytes_recv

And this is uploaded data over time:

bytes_sent

When the line flattens, it means that the probe is idle, because it is either waiting for the test helper or because it is about to take a timeout. (Given the bandwidth constraints, and given how much my home network was faster, I did run some of the measurements using Jafar2 in parallel with a delay between them that varies from few seconds to tens of seconds.)

Report file size

The following chart shows how the size of the report.jsonl file collected on disk incremented during the measurement (which is in itself a proxy of how many bytes we are going to submit).

report_size

In other experiments, the difference in the report size was less pronounced. This is one of the cases where websteps has the most advantage, but in other cases they were just quite close.

Measurement runtime

This chart shows the comparison of the overall measurement runtime:

measurement_runtime

Remember that here "measurement" means "we measure any URL".

Here it's instead the time to perform the DNS step:

dns_runtime

Here it's instead the time to query the respective TH:

th_runtime

Here it's instead the time to complete the measurement (i.e., HTTP-fetch endpoints for websteps, and TCP+TLS+HTTP for webconnectivity):

epnts_runtime

Here we see the trends highlighted above in action.

Number of measurements emitted over time

Let's start by showing the progress in the number of completed measurements for any URL:

measurement_start

And let's now instead only consider the progress in emitting test lists URLs (which should be the same for webconnectivity and instead be different for websteps, which makes every redirection explicit in the measurements):

tlu_start

As I mentioned, idle time really slow down our measuring pace. There are two sources of idle times: the timeouts or slowdowns we take when doing DNS, TLS handshake and QUIC handshakes; the timeouts or slowdowns waiting for the test helper to return a response to the caller.

@bassosimone
Copy link
Contributor Author

Closing!

bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant