feat(torsf): collect tor logs, select rendezvous method, count bytes #683

bassosimone · 2022-02-04T12:21:40Z

Checklist

I have read the contribution guidelines
reference issue for this pull request: qa(torsf): figure out proper configuration to help snowflake devs collecting useful data probe#2004, torsf: improvements over initial proof of concept probe#1686
if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: torsf: document persistent datadir and rendezvous method spec#231

Description

This diff contains significant improvements over the previous
implementation of the torsf experiment.

We add support for configuring different rendezvous methods after
the convo at ooni/probe#2004. In doing
that, I've tried to use a terminology that is consistent with the
names being actually used by tor developers.

In terms of what to do next, this diff basically instruments
torsf to always rendezvous using domain fronting. Yet, it's also
possible to change the rendezvous method from the command line,
when using miniooni, which allows to experiment a bit more. In the
same vein, by default we use a persistent tor datadir, but it's
also possible to use a temporary datadir using the cmdline.

Here's how a generic invocation of torsf looks like:

./miniooni -O DisablePersistentDatadir=true \
           -O RendezvousMethod=amp \
           -O DisableProgress=true \
           torsf

(The default is DisablePersistentDatadir=false and
RendezvousMethod=domain_fronting.)

With this implementation, we can start measuring whether snowflake
and tor together can boostrap, which seems the most important thing
to focus on at the beginning. Understanding why the bootstrap most
often does not converge with a temporary datadir on Android devices
remains instead an open problem for now. (I'll also update the
relevant issues or create new issues after commit this.)

We also address some methodology improvements that were proposed
in ooni/probe#1686. Namely:

we record the tor version;
we include the bootstrap percentage because of the logs;
we set the anomaly key correctly;
we measure the bytes send and received (by tor).

What remains to be done is the possibility of including Snowflake
events into the measurement, which is not possible until the new
improvements at common/event in snowflake.git are included into a
tagged version of snowflake itself. (I'll make sure to mention
this aspect to @cohosh in ooni/probe#2004.)

@cohosh

This diff contains significant improvements over the previous implementation of the torsf experiment. We add support for configuring different rendezvous methods after the convo at ooni/probe#2004. In doing that, I've tried to use a terminology that is consistent with the names being actually used by tor developers. In terms of what to do next, this diff basically instruments torsf to always rendezvous using domain fronting. Yet, it's also possible to change the rendezvous method from the command line, when using miniooni, which allows to experiment a bit more. In the same vein, by default we use a persistent tor datadir, but it's also possible to use a temporary datadir using the cmdline. Here's how a generic invocation of `torsf` looks like: ```bash ./miniooni -ODisablePersistentDatadir=true \ -ORendezvousMethod=amp \ -ODisableProgress=true torsf ``` (The default is `DisablePersistentDatadir=false` and `RendezvousMethod=domain_fronting`.) With this implementation, we can start measuring whether snowflake and tor together can boostrap, which seems the most important thing to focus on at the beginning. Understanding why the bootstrap most often does not converge with a temporary datadir on Android devices remains instead an open problem for now. (I'll also update the relevant issues or create new issues after commit this.) We also address some methodology improvements that were proposed in ooni/probe#1686. Namely: - we record the tor version because we include _some_ tor logs; - we include the bootstrap percentage because of the logs; - we set the anomaly key correctly. What remains to be done is the possibility of including Snowflake events into the measurement, which is not possible until the new improvements at common/event in snowflake.git are included into a tagged version of snowflake itself. (I'll make sure to mention this aspect to @cohosh in ooni/probe#2004.) It also remains to be done to measure the amount of bytes sent and received during the bootstrap, which will also probably be part of a follow-up diff (or even pull request). I also expect this diff to fail unit and integration tests, at least because of reduced coverage. This is fine because I plan to adding missing tests or fixing them as part of a follow-up diff. If you're reviewing this diff, I'd recommend focusing on (1) whether we're collecting good enough data for analysis and (2) whether the data we collect is safe to collect, or we should collect less to err more onto the safe side.

The tutorial mentioned a default initialized structure, but now we want to use a constructor, hence mention it.

bassosimone

First pass of self review. I've highlighted when we should improve with testing and have added comments for the reviewers.

internal/cmd/ptxclient/ptxclient.go

internal/engine/experiment/torsf/torsf.go

internal/engine/session.go

internal/ptx/snowflake.go

internal/tunnel/tunnel.go

bassosimone · 2022-02-04T12:49:50Z

@hellais this pull request is not complete because it's still missing to adjust tests. However, it already contains ~finished code for methodological improvements with torsf. Thus, it would be great if you could (1) take a look at the diff with particular attention on the sections that I've marked as important and (2) test from the command line using miniooni (build with go build -v ./internal/cmd/miniooni, see the initial comment for usage info) and report back on the usefulness of the data structures that I am currently collecting. Thanks a lot! 🙏

internal/engine/experiment/torsf/torsf.go

@hellais

The way I initially wrote torsf in this set of patches was broken because we could only get logs on success. What's more, I tried to get logs on failure, which crashed torsf. Luckily we have unit tests and we noticed. So, rewrite the code such that we always can access logs. While there, rework torsf unit tests to ensure they cover all the cases and they're sorted logically into the file. While there, start addressing @hellais suggestion that we should get the version of tor from the control port. After this diff, we still have some broken or missing tests, and I will work on that later today or next week.

@hellais

Suggested by @hellais.

internal/bytecounter/context.go

bassosimone · 2022-02-04T19:25:29Z

This PR is approaching readiness state but there's still need to address a couple of comments!

This diff documents new options we have added to the torsf experiment after the ooni/probe#2004 discussion. The related probe-cli PR is: ooni/probe-cli#683

internal/ptx/snowflake_test.go

internal/engine/experiment/torsf/testdata/tor.log

hellais · 2022-02-07T13:59:38Z

~~In doing some testing of this, I noticed that if I run the torsf test without any arguments in miniooni I get the following error:~~

miniooni torsf
[      0.000222] <info> Current time: 2022-02-07 14:57:03 CET
[      0.002505] <info> miniooni home directory: $HOME/.miniooni
[      0.002755] <info> Looking up OONI backends; please be patient...
[      0.733684] <info> session: using probe services: {Address:https://ps1.ooni.io Type:https Front:}
[      0.733706] <info> Looking up your location; please be patient...
[      4.567626] <info> - country: IT
[      4.567656] <info> - network: Vodafone Italia S.p.A. (AS30722)
[      4.567666] <info> - resolver's IP: 172.253.12.129
[      4.567670] <info> - resolver's network: Google LLC (AS15169)
[      4.567683] <warn> cannot create experiment builder: map[error:no such experiment: torsf]
[      4.568241] <info> sessionresolver: failure rate: primary: 4/4; fallback: 0/4
[      4.568447] <info> whole session: recv   7 Mbyte, sent   5 kbyte
cannot create experiment builder%

I guess it's fine to require specifying correct arguments as miniooni is an experimental client, but if it's not too hard we might want to still support running the test with default arguments or improve the returned error (the fact I read map[error:no such experiment: torsf], made me wonder if I was on the correct branch).

Ignore this, I was running the miniooni I had in the PATH and not the one I built from the branch. Ignore this noise.

bassosimone · 2022-02-07T14:05:32Z

I guess it's fine to require specifying correct arguments as miniooni is an experimental client, but if it's not too hard we might want to still support running the test with default arguments or improve the returned error (the fact I read map[error:no such experiment: torsf], made me wonder if I was on the correct branch).

I find this error very surprising. ./miniooni torsf should work just fine. What is your current branch and what is its tip?

This diff documents new options we have added to the torsf experiment after the ooni/probe#2004 discussion. The related probe-cli PR is: ooni/probe-cli#683

@hellais

Discussed with @hellais.

hellais

LGTM

@cohosh

…oni#683) This diff contains significant improvements over the previous implementation of the torsf experiment. We add support for configuring different rendezvous methods after the convo at ooni/probe#2004. In doing that, I've tried to use a terminology that is consistent with the names being actually used by tor developers. In terms of what to do next, this diff basically instruments torsf to always rendezvous using domain fronting. Yet, it's also possible to change the rendezvous method from the command line, when using miniooni, which allows to experiment a bit more. In the same vein, by default we use a persistent tor datadir, but it's also possible to use a temporary datadir using the cmdline. Here's how a generic invocation of `torsf` looks like: ```bash ./miniooni -O DisablePersistentDatadir=true \ -O RendezvousMethod=amp \ -O DisableProgress=true \ torsf ``` (The default is `DisablePersistentDatadir=false` and `RendezvousMethod=domain_fronting`.) With this implementation, we can start measuring whether snowflake and tor together can boostrap, which seems the most important thing to focus on at the beginning. Understanding why the bootstrap most often does not converge with a temporary datadir on Android devices remains instead an open problem for now. (I'll also update the relevant issues or create new issues after commit this.) We also address some methodology improvements that were proposed in ooni/probe#1686. Namely: 1. we record the tor version; 2. we include the bootstrap percentage by reading the logs; 3. we set the anomaly key correctly; 4. we measure the bytes send and received (by `tor` not by `snowflake`, since doing it for snowflake seems more complex at this stage). What remains to be done is the possibility of including Snowflake events into the measurement, which is not possible until the new improvements at common/event in snowflake.git are included into a tagged version of snowflake itself. (I'll make sure to mention this aspect to @cohosh in ooni/probe#2004.)

bassosimone added 2 commits February 4, 2022 13:08

fix: ensure the tutorial wording still makes sense

a49320e

The tutorial mentioned a default initialized structure, but now we want to use a constructor, hence mention it.

bassosimone commented Feb 4, 2022

View reviewed changes

hellais reviewed Feb 4, 2022

View reviewed changes

internal/engine/experiment/torsf/torsf.go Outdated Show resolved Hide resolved

hellais reviewed Feb 4, 2022

View reviewed changes

internal/engine/experiment/torsf/torsf.go Outdated Show resolved Hide resolved

hellais reviewed Feb 4, 2022

View reviewed changes

internal/engine/experiment/torsf/torsf.go Show resolved Hide resolved

hellais reviewed Feb 4, 2022

View reviewed changes

internal/engine/experiment/torsf/torsf.go Show resolved Hide resolved

bassosimone added 5 commits February 4, 2022 16:22

fix(ptx): ensure we have coverage of newly added code

40a587a

fix(tunnel): ensure we have working tests

f2c1396

fix: ensure we gather tor version from the control port

9555ad2

Suggested by @hellais.

fix: also count the bytes consumed by torsf

84bfa84

bassosimone commented Feb 4, 2022

View reviewed changes

internal/bytecounter/context.go Outdated Show resolved Hide resolved

bassosimone changed the title ~~feat(torsf): collect tor logs, select rendezvous method~~ feat(torsf): collect tor logs, select rendezvous method, count bytes Feb 4, 2022

fix: rename a function that needed renaming

04758b5

bassosimone mentioned this pull request Feb 7, 2022

torsf: document persistent datadir and rendezvous method ooni/spec#231

Merged

4 tasks

bassosimone commented Feb 7, 2022

View reviewed changes

internal/ptx/snowflake_test.go Show resolved Hide resolved

fix: final fixes before marking PR as ready

4d57276

bassosimone marked this pull request as ready for review February 7, 2022 13:05

hellais reviewed Feb 7, 2022

View reviewed changes

internal/engine/experiment/torsf/testdata/tor.log Show resolved Hide resolved

bassosimone added 2 commits February 7, 2022 16:48

fix(torsf): use regexp to extract progress line

bacab49

Discussed with @hellais.

fix(torsf): repair unit tests

1ffd6b8

hellais self-requested a review February 7, 2022 16:03

hellais approved these changes Feb 7, 2022

View reviewed changes

bassosimone merged commit 85664f1 into master Feb 7, 2022

bassosimone deleted the issue/2004 branch February 7, 2022 16:05

bassosimone mentioned this pull request Feb 8, 2022

qa(torsf): figure out proper configuration to help snowflake devs collecting useful data ooni/probe#2004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(torsf): collect tor logs, select rendezvous method, count bytes #683

feat(torsf): collect tor logs, select rendezvous method, count bytes #683

bassosimone commented Feb 4, 2022 •

edited

Loading

bassosimone left a comment

bassosimone commented Feb 4, 2022

bassosimone commented Feb 4, 2022

hellais commented Feb 7, 2022 •

edited

Loading

bassosimone commented Feb 7, 2022 •

edited

Loading

hellais left a comment

feat(torsf): collect tor logs, select rendezvous method, count bytes #683

feat(torsf): collect tor logs, select rendezvous method, count bytes #683

Conversation

bassosimone commented Feb 4, 2022 • edited Loading

Checklist

Description

bassosimone left a comment

Choose a reason for hiding this comment

bassosimone commented Feb 4, 2022

bassosimone commented Feb 4, 2022

hellais commented Feb 7, 2022 • edited Loading

bassosimone commented Feb 7, 2022 • edited Loading

hellais left a comment

Choose a reason for hiding this comment

bassosimone commented Feb 4, 2022 •

edited

Loading

hellais commented Feb 7, 2022 •

edited

Loading

bassosimone commented Feb 7, 2022 •

edited

Loading