This repository contains the third prototype of websteps (codename "winter 2022"). This prototype follows after the "summer 2021" and the "fall 2021" protypes.
The internal directory contains code derived from
ooni/probe-cli v3.14.1 as
well as new code. As a rule of thumb, most directories could
be easily merged back, except measurex
and websteps
, which
have been significantly rewritten and would require either
more careful merging or a yolo-rewrite-everything approach.
The cmd directory contains commands using code in the internal library. The most important commands are:
-
cmd/websteps: websteps client;
-
cmd/thd: test helper.
The spec directory contains the current draft specification of websteps, which still needs to be discussed with my colleagues and other friends of OONI.
The python directory contains:
-
python/websteps.py: minimal implementation of the websteps concept written in Python, not meant to become production ready, but rather useful to see the algorithms in a smaller context and show that websteps implementations not written in Go could inter-operated with the test helper written in Go;
-
python/analysis/dbsteps: Python script to analyze websteps measurements and view them in the browser;
-
python/testcase/create: script to create integration test cases for websteps while running measurements;
-
python/testcase/shell: script to manage the integration tests for websteps;
-
python/ooni: comprehensive library to import and process websteps measurements using Python.
The testdata/testcase directory contains a few test cases collected using the create command and managed using the shell command.
The html directory contains support file for browsing websteps measurements and test cases using HTML.
NOTE: while I spent some time to make this code polished, this is still experimental code, with little unit testing and, for sure, a bunch of inconsistencies betwenn the spec and the implementation. This is normal, given that for now websteps is still a bit of a moving target.
You need to use go1.17.13 to build this repository.
go build -v ./cmd/websteps
go build -v ./cmd/thd
These are the main changes since the fall 2021 edition (collection?! 😅):
-
added support for
PTR
andNS
queries as well as for opportunistially extract theCNAME
from replies; -
implemented a parallel DNSResolver using custom DNSTransport;
-
reworked the system resolver to fake a DNSTransport and produce more easily the OONI DNS data format;
-
several reliability and correctness fixes in DNS code;
-
significantly reworked the conceptual model of
measurex
to more easily accommodate for implementing websteps; -
around one month of experience running websteps code in several countries (including China, Italy, and Iran), which dramatically helped to improve the robustness of the implementation as well as to develop "scoring" algorithms;
-
developed a set of algorithms to assign blocking flags to websteps measurements as well as heuristics to spot common classes of false positives and flag them correctly;
-
implemented and integrated a
dnsping
extension for websteps that allows to confirm with more confidence cases of DNS blocking as well as to retreat DNS timeout claims when there are transient timeouts; -
integration testing framework based on caching the TH and the probe's measurements that is based on replaying measurements collected on the field (thus being more true to real world censorship than simulated censorship using
jafar
or similar tools); -
robust caching mechanism for the TH;
-
started experimenting with using
TLSH
to classify webpages in addition to using the traditional Web Connectivity algorithm (but this effort is so far a bit inconclusive); -
TH protocol using WebSocket in addition to web APIs to increase robustness when middleboxes close connections that stay silent for a number of seconds;
-
figure out ways in which the original, optimistic let's-measure-every-endpoint model breaks when coupled with the typical OONI constraints of timing and single-URL-at-a-time and add to the algorithm reasonable settings to strike a balance between depth and breadth;
-
learn that my effort estimate is usually off by a 5x factor 😬.
-
the design incorporates future improvements in the check-in API that will allow us to customize how we measure URLs depending on the context (so, we will be able to say for each URL in a given country and ASN, the amount of body bytes to download, whether to follow redirects, etc).
This work addresses in part of completely:
issue | level of completion |
---|---|
probe#2034 | complete |
probe#1190 | complete |
probe#1806 | complete |
probe#1803 | now unnecessary |
probe#1516 | mostly(?) complete |
probe#1718 | complete |
-
continuing to discuss the spec with OONI friends;
-
prepare short presentation for pitching websteps since the spec is long and it may be beneficial to also provide people with short introductions;
-
continue extensive data analysis and start preparing reports/blog posts based on this work;
-
write spec for extensions (including
dnsping
, already implemented, andsniblocking
, which we need); -
collect more test cases and add support for automatically checking that we're still passing these test cases;
-
figure out ways to auto-generate parts of the codebase if possible (especially python data structs that depend on Go data structs: that would be nice);
-
perform again a performance comparison with Web Connectivity and also a comparison in terms of accuracy;
-
double check that our level of parallelism is adequate for testing in low bandwidth scenarios;
-
start merging back into
probe-cli
the easy parts and generally aim to reduce the diff between this fork and the original codebase; -
sync up the OONI issue tracker with the work I have beem doing here basically in
sti
mode; -
extend the underlying library to add support for as many raw errors as possible;
-
decide how to adapt tutorials to changes in here.
If thd
is running locally (and please rememeber to
force it to drop root
privileges), you can integrate
it with an existing nginx
setup by adding:
location /websteps/v1/websocket {
proxy_read_timeout 900;
proxy_pass http://127.0.0.1:9876;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
location /websteps/v1/http {
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 900;
proxy_pass http://127.0.0.1:9876;
}