Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webconnectivity: write prototype of new experiment #1733

Closed
bassosimone opened this issue Aug 17, 2021 · 5 comments
Closed

webconnectivity: write prototype of new experiment #1733

bassosimone opened this issue Aug 17, 2021 · 5 comments

Comments

@bassosimone
Copy link
Contributor

This issue is about writing a prototype of the new webconnectivity experiment (tentatively called websteps) along with a brief specification of its behavior.

@bassosimone bassosimone added this to the Sprint 46 - Happy Oyster milestone Aug 17, 2021
bassosimone pushed a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
This is the extension of #431, and my final deliverable for GSoC 2021.

The diff introduces:

1) The new `testhelper` which supports testing multiple IP endpoints per domain and introduces HTTP/3 control measurements. The specification of the `testhelper` can be found at ooni/spec#219. The `testhelper` algorithm consists of three main steps:

   * `InitialChecks` verifies that the input URL can be parsed, has an expected scheme, and contains a valid domain name.

   * `Explore` enumerates all the URLs that it discovers by redirection from the original URL, or by detecting h3 support at the target host.

   * `Generate` performs a step-by-step measurement of each discovered URL.

2) A prototype of the corresponding new experiment `websteps` which uses the control measurement of the `testhelper` to know which URLs to measure, and what to expect. The prototype does not yet have:

   * unit and integration tests,

   * an analysis tool to compare the control and the probe measurement.

This PR is my final deliverable as it is the outcome of the trials, considerations and efforts of my GSoC weeks at OONI. 
It fully integrates HTTP/3 (QUIC) support which has been only used in the `urlgetter` experiment until now.

Related issues: ooni/probe#1729 and ooni/probe#1733.
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
We started doing this in #432.

This work is part of ooni/probe#1733.
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
We started doing this in #432.

This work is part of ooni/probe#1733.
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 17, 2021
@bassosimone
Copy link
Contributor Author

We're making progress. We cannot call this task done until we have a spec.

bassosimone pushed a commit to ooni/oohttp that referenced this issue Aug 17, 2021
This diff adds a wrapper for `Transport` that looks like an `http.Transport`.

Part of ooni/probe#1733
bassosimone pushed a commit to ooni/probe-cli that referenced this issue Aug 18, 2021
This diff enables `websteps` to use uTLS for TLS parroting. It integrates the `oohttp.StdlibTransport` wrapper which uses the `ooni/oohttp` fork. `oohttp` supports TLS-like connections like `utls.Conn`.
As a prototype, the testhelper and `websteps` code now uses the `utls.HelloChrome_Auto` fingerprint, i.e. the simulated TLS fingerprint of the Google Chrome browser.

It is a further contribution for my GSoC project.

Reference issue: ooni/probe#1733
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 9, 2021
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 9, 2021
@bassosimone
Copy link
Contributor Author

bassosimone commented Sep 13, 2021

Here's a status update:

Changes to the current design

With @hellais we found a flaw in the current design. If we let the test-helper enumerate all possible redirects, we cannot be sure about whether the redirect chain would match the one occurring inside the country.

The fix is conceptually simple. We should instead let the probe choose the next redirect to follow based on its own and the test helper's results.

Therefore, the probe will need to contact the helper multiple times. Basically this will happen once for each redirect that the probe follows.

This change reduces a bit the flexibility of the design. The test helper now is less able to influence what the probe would do. However, the reason why we are changing the design seems quite valid and it's difficult to argue against it.

Changes to simplify backend processing

The biggest change here is that we need to produce a new measurement for each URL. We will put both TCP and QUIC measurement within the same HTTPS URL because both TCP and QUIC are possible for such an URL.

Current design process

At this stage, we're discussing the design internally as a set of internal documents and we will not update the outstanding docs in ooni/spec for the time being. We still have lots of churn and it is easier to proceed in this way.

We will open the design for a more public review at a later stage. As part of this step of review, we will publish updated specs.

Desired architecture

Untitled-2021-07-09-1501

We want something like the above. It would be nice to use the same measurement technique for other experiments (see, for example, #1761). So far, this way of measuring seems easier to implement and maintain than using tracing.

bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 29, 2021
This commit introduce a measurement library that consists of
refactored code from earlier websteps experiments.

I am not going to add tests for the time being, because this library
is still a bit in flux, as we finalize websteps.

I will soon though commit documentation explaining in detail how
to use it, which currrently is at #506
and adds a new directory to internal/tutorial.

The core idea of this measurement library is to allow two
measurement modes:

1. tracing, which is what we're currently doing now, and the
tutorial shows how we can rewrite the measurement part of web
connectivity with measurex using less code. Under a tracing
approach, we construct a normal http.Client that however has
tracing configured, we gather events for resolve, connect, TLS
handshake, QUIC handshake, HTTP round trip, etc. and then we
try to make sense of what happened from the events stream;

2. step-by-step, which is what websteps does, and basically
means that after each operation you immediately write into
a Measurement structure its results and immediately draw the
conclusions on what seems odd (which later may become an
anomaly if we see what the test helper measured).

This library is also such that it produces a data format
compatible with the current OONI spec.

This work is part of ooni/probe#1733.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 30, 2021
This diff adds the prototype websteps implementation that used
to live at #506.

The code is reasonably good already and it's pointing to a roaming
test helper that I've properly configured.

You can run websteps with:

```
./miniooni -n websteps
```

This will go over the test list for your country.

At this stage the mechanics of the experiment is set, but we
still need to have a conversation on the following topics:

1. whether we're okay with reusing the data format used by other
OONI experiments, or we would like to use a more compact data
format (which may either be a more compact JSON or we can choose
to always submit compressed measurements for websteps);

2. the extent to which we would like to keep the measurement as
a collection of "the experiment saw this" and "the test helper
saw that" and let the pipeline choose an overall score: this is
clearly an option, but there is also the opposite option to
build a summary of the measurement on the probe.

Compared to the previous prototype of websteps, the main
architectural change we have here is that we are following
the point of view of the probe and the test helper is
much more dumb. Basically, the probe will choose which
redirection to follow and ask the test helper every time
it discovers a new URL to measure it w/o redirections.

Reference issue: ooni/probe#1733
@bassosimone
Copy link
Contributor Author

This issue has now been completed with ooni/probe-cli#530. I will now create follow up issues.

@bassosimone
Copy link
Contributor Author

I've also bumped the estimate from 21 to 40 because this has been a huge task.

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This is the extension of ooni#431, and my final deliverable for GSoC 2021.

The diff introduces:

1) The new `testhelper` which supports testing multiple IP endpoints per domain and introduces HTTP/3 control measurements. The specification of the `testhelper` can be found at ooni/spec#219. The `testhelper` algorithm consists of three main steps:

   * `InitialChecks` verifies that the input URL can be parsed, has an expected scheme, and contains a valid domain name.

   * `Explore` enumerates all the URLs that it discovers by redirection from the original URL, or by detecting h3 support at the target host.

   * `Generate` performs a step-by-step measurement of each discovered URL.

2) A prototype of the corresponding new experiment `websteps` which uses the control measurement of the `testhelper` to know which URLs to measure, and what to expect. The prototype does not yet have:

   * unit and integration tests,

   * an analysis tool to compare the control and the probe measurement.

This PR is my final deliverable as it is the outcome of the trials, considerations and efforts of my GSoC weeks at OONI. 
It fully integrates HTTP/3 (QUIC) support which has been only used in the `urlgetter` experiment until now.

Related issues: ooni/probe#1729 and ooni/probe#1733.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This diff enables `websteps` to use uTLS for TLS parroting. It integrates the `oohttp.StdlibTransport` wrapper which uses the `ooni/oohttp` fork. `oohttp` supports TLS-like connections like `utls.Conn`.
As a prototype, the testhelper and `websteps` code now uses the `utls.HelloChrome_Auto` fingerprint, i.e. the simulated TLS fingerprint of the Google Chrome browser.

It is a further contribution for my GSoC project.

Reference issue: ooni/probe#1733
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This diff has been extracted from ooni#506.

In it, we introduce wrapping constructors for types and we
update the docs. These new constructures are used by the code
in ooni#506.

In itself, this work is part of ooni/probe#1733.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
I have recently seen a data race related our way of
mutating the outgoing request to set the host header.

Unfortunately, I've lost track of the race output,
because I rebooted my Linux box before saving it.

Though, after inspecting why and and where we're mutating
outgoing requets, I've found that:

1. we add the host header when logging to have it logged,
which is not a big deal since we already emit the URL
rather than just the URL path when logging a request, and
so we can safely zap this piece of code;

2. as a result, in measurements we may omit the host header
but again this is pretty much obvious from the URL itself
and so it should not be very important (nonetheless,
avoid surprises and keep the existing behavior);

3. when the User-Agent header is not set, we default to
a `miniooni/0.1.0-dev` user agent, which is probably not
very useful anyway, so we can actually remove it.

Part of ooni/probe#1733 (this diff
has been extracted from ooni#506).
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
While there, make sure netxlite has 100% coverage.

Part of ooni/probe#1733 and diff
has been extracted from ooni#506.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This diff attempts to modify the errors reported by our custom
resolver by matching more strings from the stdlib.

Part of ooni/probe#1733 and diff has been
extracted from ooni#506.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This new API call performs DNS lookups for HTTPS records.

Part of ooni/probe#1733 and diff has been
extracted from ooni#506.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
I need to run test on Windows and I just discovered that:

1. the `errno_unix.go` filename does not mean anything because
`unix` is not a valid platform, so we need a filename for
each platform that we care about;

2. on Windows we need to use WSA prefixed names;

3. `i/e/session_psiphon.go` was not building because of the
migration from `netxlite/iox` to `netxlite`.

This diff attempts to fix all three issues.

The reference issue is ooni/probe#1733,
because I was working on such an issue.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
* feat: run ~always netxlite integration tests

This diff ensures that we check on windows, linux, macos that our
fundamental networking library (netxlite) works.

We combine unit and integration tests.

This work is part of ooni/probe#1733, where
I want to have more strong guarantees about the foundations.

* fix(filtering/tls_test.go): make portable on Windows

The trick here is to use the wrapped error so to normalize the
different errors messages we see on Windows.

* fix(netxlite/quic_test.go): make portable on windows

Rather than using the zero port, use the `x` port which fails
when the stdlib is parsing the address.

The zero port seems to work on Windows while it does not on Unix.

* fix(serialresolver_test.go): make error more timeout than before

This seems enough to convince Go on Windows about this error
being really a timeout timeouty timeouted thingie.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
At the moment ooapi is not used. It will eventually be used since
it's a better way of accessing the OONI backend API.

To fix these tests, we need to fix the swagger emitted by the
backend API, which is not a priority at the moment, since we are
working instead to integrate websteps in miniooni.

Issue ooni/probe#1790 tracks the work
required to re-enabled the tests I'm skipping with this diff.

This work is part of ooni/probe#1733.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
The explanatory comment in the diff says it all.

Work done while I was converging with ooni/probe#1733.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This is required to implement websteps, which is currently tracked
by ooni/probe#1733.

We introduce the concept of async runner. An async runner will
post measurements on a channel until it is done. When it is done,
it will close the channel to notify the reader about that.

This change causes sync experiments now to strictly return either
a non-nil measurement or a non-nil error.

While this is a pretty much obvious situation in golang, we had
some parts of the codebase that were not robust to this assumption
and attempted to submit a measurement after the measure call
returned an error.

Luckily, we had enough tests to catch this change in our assumption
and this is why there are extra docs and tests changes.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This commit introduce a measurement library that consists of
refactored code from earlier websteps experiments.

I am not going to add tests for the time being, because this library
is still a bit in flux, as we finalize websteps.

I will soon though commit documentation explaining in detail how
to use it, which currrently is at ooni#506
and adds a new directory to internal/tutorial.

The core idea of this measurement library is to allow two
measurement modes:

1. tracing, which is what we're currently doing now, and the
tutorial shows how we can rewrite the measurement part of web
connectivity with measurex using less code. Under a tracing
approach, we construct a normal http.Client that however has
tracing configured, we gather events for resolve, connect, TLS
handshake, QUIC handshake, HTTP round trip, etc. and then we
try to make sense of what happened from the events stream;

2. step-by-step, which is what websteps does, and basically
means that after each operation you immediately write into
a Measurement structure its results and immediately draw the
conclusions on what seems odd (which later may become an
anomaly if we see what the test helper measured).

This library is also such that it produces a data format
compatible with the current OONI spec.

This work is part of ooni/probe#1733.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This diff adds the prototype websteps implementation that used
to live at ooni#506.

The code is reasonably good already and it's pointing to a roaming
test helper that I've properly configured.

You can run websteps with:

```
./miniooni -n websteps
```

This will go over the test list for your country.

At this stage the mechanics of the experiment is set, but we
still need to have a conversation on the following topics:

1. whether we're okay with reusing the data format used by other
OONI experiments, or we would like to use a more compact data
format (which may either be a more compact JSON or we can choose
to always submit compressed measurements for websteps);

2. the extent to which we would like to keep the measurement as
a collection of "the experiment saw this" and "the test helper
saw that" and let the pipeline choose an overall score: this is
clearly an option, but there is also the opposite option to
build a summary of the measurement on the probe.

Compared to the previous prototype of websteps, the main
architectural change we have here is that we are following
the point of view of the probe and the test helper is
much more dumb. Basically, the probe will choose which
redirection to follow and ask the test helper every time
it discovers a new URL to measure it w/o redirections.

Reference issue: ooni/probe#1733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants