torsf: testing on Android and Desktop #1917

bassosimone · 2021-12-13T18:23:48Z

This issue is about performing QA of the torsf implementation on Desktop and mobile devices. We will close this issue when we have performed enough QA of the implementation to move forward ~~and release. If there are significant issues, we can also consider releasing torsf only on some platforms rather than releasing it on all of them~~. The original plan was to do enough QA until release but it emerged a need to chat with Snowflake devs as it seems it may not be the best use of the next release to release a torsf that most often times out on mobile.

After the work done in #1717, we want to test how the new experiment is working on mobile devices, as well as on Desktop devices. To this end, we need to integrate together a cli that knows about torsf with a mobile app version and a Desktop version that knows about it.

Executive summary

Just see #2004

The text was updated successfully, but these errors were encountered:

Say you have built an experimental `oonimkall.aar` artifact from the `ooni/probe-cli` repo at whatever commit in whatever branch. Say you want to test such an artifact against the current `master` of the `ooni/probe-android` app. How do you achieve that? 1. copy `oonimkall.aar` inside `./engine-experimental`; 2. `./gradlew assembleExperimentalFullRelease` (remember to set `ANDROID_SDK_ROOT`); 3. locate the `apk` using `find` and then use `adb install` to install such an APK on your phone. I'm doing this work as part of ooni/probe#1917 to facilitate quickly testing of `torsf` and co-developing together the `probe-cli` and `probe-android` repositories.

This diff contains the following changes and enhancements: 1. upgrade snowflake to v2 2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that; 3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing); 4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this); 5. ptx.go: avoid printing an error when the connection has just been closed; 6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled. Related issues: ooni/probe#1845, ooni/probe#1894, and ooni/probe#1917.

Reference issue: ooni/probe#1917

Reference issue: ooni/probe#1917. I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.

This commit message is the same across probe-cli, probe-desktop, and probe-android. With the changes contained in the enclosed diff, I'm starting to add support for torsf for android and for desktop. When smoke testing that torsf was WAI, I also noticed that its progress messages in output are too frequent. We may want to do better in a future version when we'll be able to read `tor`'s output. In the meanwhile, make the progress messages less frequent and indicated the maximum runtime inside of the messages themselves. This improved message, albeit not so nice from the UX PoV, should at least provide a clue that we're not stuck. Reference issue: ooni/probe#1917

This version of the cli already includes support for torsf and it's safe to test using this version knowing that the release will consist of this version plus minor patches in case we found any defect during the QA. Reference issue: ooni/probe#1917

This version of the cli already includes support for torsf and it's safe to test using this version knowing that the release will consist of this version plus minor patches in case we found any defect during the QA. With this change in, we're able to make test releases of the Android app for the purpose of testing torsf. Reference issue: ooni/probe#1917

bassosimone · 2022-01-25T15:27:31Z

So, here's an update regarding torsf testing. With @hellais, I have been testing torsf using v3.14.0-alpha.1, which currently is equivalent to master.

Bootstrap time issues

The main issue we've been facing so far has been the total time required to bootstrap tor. This has been, in particular, a problem on Android, where the default torsf config has timed out (after 600 s) in most runs.

We are confident that one of the main reasons why the bootstrap takes so much time is that we're bootstrapping tor from scratch (i.e., without a previously cached consensus). In everyday life, this condition should only arise the first time your bootstrap tor. Afterwards, tor will have cached microdescriptors and consensus.

Diff 1: don't bootstrap from scratch

Therefore, the first alternative configuration to try is one in which we use a fixed directory for torsf as opposed to a directory created in the operating-system's temporary directory (/tmp on Unix). The following diff implement this:

diff --git a/internal/engine/experiment/torsf/torsf.go b/internal/engine/experiment/torsf/torsf.go
index a96b466..6d1785b 100644
--- a/internal/engine/experiment/torsf/torsf.go
+++ b/internal/engine/experiment/torsf/torsf.go
@@ -122,7 +122,7 @@ func (m *Measurer) run(ctx context.Context,
 	tun, err := m.startTunnel()(ctx, &tunnel.Config{
 		Name:      "tor",
 		Session:   sess,
-		TunnelDir: path.Join(sess.TempDir(), "torsf"),
+		TunnelDir: path.Join(sess.TunnelDir(), "torsf"),
 		Logger:    sess.Logger(),
 		TorArgs: []string{
 			"UseBridges", "1",
diff --git a/internal/engine/mockable/mockable.go b/internal/engine/mockable/mockable.go
index 690a7c9..1893002 100644
--- a/internal/engine/mockable/mockable.go
+++ b/internal/engine/mockable/mockable.go
@@ -34,6 +34,7 @@ type Session struct {
 	MockableTempDir                  string
 	MockableTorArgs                  []string
 	MockableTorBinary                string
+	MockableTunnelDir                string
 	MockableUserAgent                string
 }
 
@@ -135,6 +136,11 @@ func (sess *Session) TorBinary() string {
 	return sess.MockableTorBinary
 }
 
+// TunnelDir implements ExperimentSession.TunnelDir.
+func (sess *Session) TunnelDir() string {
+	return sess.MockableTunnelDir
+}
+
 // UserAgent implements ExperimentSession.UserAgent
 func (sess *Session) UserAgent() string {
 	return sess.MockableUserAgent
diff --git a/internal/engine/session.go b/internal/engine/session.go
index 4efb4bf..eea5978 100644
--- a/internal/engine/session.go
+++ b/internal/engine/session.go
@@ -98,6 +98,9 @@ type Session struct {
 	// tunnel is the optional tunnel that we may be using. It is created
 	// by NewSession and it is cleaned up by Close.
 	tunnel tunnel.Tunnel
+
+	// tunnelDir is the directory where to create state for persistent tunnels.
+	tunnelDir string
 }
 
 // sessionProbeServicesClientForCheckIn returns the probe services
@@ -163,6 +166,7 @@ func NewSession(ctx context.Context, config SessionConfig) (*Session, error) {
 		tempDir:                 tempDir,
 		torArgs:                 config.TorArgs,
 		torBinary:               config.TorBinary,
+		tunnelDir:               config.TunnelDir,
 	}
 	proxyURL := config.ProxyURL
 	if proxyURL != nil {
@@ -204,6 +208,11 @@ func NewSession(ctx context.Context, config SessionConfig) (*Session, error) {
 	return sess, nil
 }
 
+// TunnelDir exports the directory for persistent tunnels.
+func (s *Session) TunnelDir() string {
+	return s.tunnelDir
+}
+
 // KibiBytesReceived accounts for the KibiBytes received by the HTTP clients
 // managed by this session so far, including experiments.
 func (s *Session) KibiBytesReceived() float64 {
diff --git a/internal/model/experiment.go b/internal/model/experiment.go
index 44b71ea..1e8a8a3 100644
--- a/internal/model/experiment.go
+++ b/internal/model/experiment.go
@@ -23,6 +23,7 @@ type ExperimentSession interface {
 	TempDir() string
 	TorArgs() []string
 	TorBinary() string
+	TunnelDir() string
 	UserAgent() string
 }

Diff 2: use the AMP cache

Another different configuration could be that of using the AMP cache as opposed to the normal rendezvous with the broker (henceforth, "rendezvous"). We implement this change in ooniprobe using:

diff --git a/internal/ptx/snowflake.go b/internal/ptx/snowflake.go
index bb6a913..190c4b2 100644
--- a/internal/ptx/snowflake.go
+++ b/internal/ptx/snowflake.go
@@ -83,18 +83,17 @@ func (d *SnowflakeDialer) ampCacheURL() string {
 	//
 	// So I disabled the AMP cache until we figure it out.
 	//
-	//return "https://cdn.ampproject.org/"
-	return ""
+	return "https://cdn.ampproject.org/"
 }
 
 // brokerURL returns a suitable broker URL.
 func (d *SnowflakeDialer) brokerURL() string {
-	return "https://snowflake-broker.torproject.net.global.prod.fastly.net/"
+	return "https://snowflake-broker.torproject.net/"
 }
 
 // frontDomain returns a suitable front domain.
 func (d *SnowflakeDialer) frontDomain() string {
-	return "cdn.sstatic.net"
+	return "www.google.com"
 }
 
 // iceAddresses returns suitable ICE addresses.

Diff 3: stop and retry during the test

No diff for now. This is another possible idea: if tor does not bootstrap after say 90 seconds, we can try another time within the same test. If the apps supported showing more than one measurement this would be even cooler. Though, we already have some support for emitting multiple measurements in probe-cli and perhaps we can thus use it.

Summary of tests so far

Here's the TL;DR regarding my tests so far. I repeated 40 runs of torsf with each possible configuration (there are four of them: rendezvous, rendezvous + reuse cache, AMP, AMP + reuse cache) from my macOS system.

Configuration	Median bootstrap time
rendezvous	57 s
rendezvous + cache	7.8 s
amp	95 s
amp + cache	9.5 s

I'm now running some tests on mobile. Here's what I learned so far. The default configuration ("rendezvous") has been working on and off with iOS, with 600 s of timeout. The default configuration has always timed out for me on Android, but the "rendezvous + cache" configuration completed the bootstrap in reasonable time on the first run (103 s - a number that seems comparable to desktop) and since then has been predictably bootstrapped in little time.

How to run similar tests

You should clone github.com/ooni/probe-cli and be sure your're at master. The basic test is repeating ./miniooni torsf a bunch of times. Then you can get the run times by extracting them from report.jsonl (the torsf experiment emits a message when it's finished). A 600 s runtime means timeout, smaller mean success.

To build miniooni use go build -v ./internal/cmd/miniooni.

It would be useful to test the current version of master as well as combinations of the above patches.

While there, ensure that we print a warning if we cannot find the correct tor binary. Work part of ooni/probe#1917.

skwowet · 2022-01-26T19:58:13Z

I repeated all the above-mentioned configurations multiple times (7-8) in my macOS system. Here's the TL;DR regarding my tests.

Configuration	Median bootstrap time
rendezvous	31s
rendezvous + cache	154s
amp	70s
amp + cache	65s

Also, before running the tests make sure tor is installed in the system.

bassosimone · 2022-01-28T13:40:49Z

Here are the results of Android runs with the rendezvous configuration and 600s of timeout: 10/10 timeouts.

Other three tests: I bumped the timeout to 900s. First time: success in 45 s. Second and this time: timeout. Here's the logcat from one of these tests.

I think the following log line possibly means that tor gives up bootstrapping now (to be confirmed reading the src):

01-28 15:19:22.401 20606 20948 E GoLog   : Jan 28 15:19:22.000 [notice] Delaying directory fetches: No running bridges

(Here's another logcat I collected earlier this week that shows the same "Delaying directory fetches" message.)

I'm now going to run some tests with the cache configuration where we do the normal rendezvous but we also use the diff above to ensure that we're reusing the same tor cache directory in each experiment.

Run	Bootstrap time
1	10.9 s
2	17.9 s
3	12.6 s
4	4.8 s
5	2.5 s
6	24.2 s
7	16.9 s
8	2.47 s

Here's a logcat of a working run in this configuration.

bassosimone · 2022-01-28T20:02:59Z

Here's a different attempt: what if we wipe the cache every time but we still put the cache inside the same directory in which the application stores persistent data as opposed to using the temporary directory?

Run	Bootstrap time	Note
1	12.8 s	we still had a cache here [*]
2	900 s timeout	surely no cache
3	900 s timeout	surely no cache
4	900 s timeout	surely no cache
5	900 s timeout	surely no cache
6	900 s timeout	surely no cache
7	900 s timeout	surely no cache

[*] The first test could probably already have a cache? I installed this build of the app just after removing the previous build that was keeping the cache, so I'm wondering if this could be the problem.

bassosimone · 2022-01-31T17:25:32Z

I'm going to close this issue now. I did my best to summarize the questions for Snowflake devs in #2004 and the original body of work related to QA has been completed anyway.

skwowet · 2022-01-31T18:12:39Z

Here are the results of Android tests that I ran from India, with the rendezvous configuration and 600s of timeout.

Run	Bootstrap time
1	27.5s
2	504
3	timeout_error
4	timeout_error
5	99.9s
6	timeout_error
7	timeout_error

This diff contains the following changes and enhancements: 1. upgrade snowflake to v2 2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that; 3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing); 4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this); 5. ptx.go: avoid printing an error when the connection has just been closed; 6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled. Related issues: ooni/probe#1845, ooni/probe#1894, and ooni/probe#1917.

Reference issue: ooni/probe#1917. I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.

This commit message is the same across probe-cli, probe-desktop, and probe-android. With the changes contained in the enclosed diff, I'm starting to add support for torsf for android and for desktop. When smoke testing that torsf was WAI, I also noticed that its progress messages in output are too frequent. We may want to do better in a future version when we'll be able to read `tor`'s output. In the meanwhile, make the progress messages less frequent and indicated the maximum runtime inside of the messages themselves. This improved message, albeit not so nice from the UX PoV, should at least provide a clue that we're not stuck. Reference issue: ooni/probe#1917

While there, ensure that we print a warning if we cannot find the correct tor binary. Work part of ooni/probe#1917.

bassosimone added the testing label Dec 13, 2021

bassosimone self-assigned this Dec 13, 2021

bassosimone added the priority/high label Dec 13, 2021

bassosimone mentioned this issue Dec 13, 2021

feat: allow one to build and test a oonimkall.aar ooni/probe-android#466

Merged

bassosimone mentioned this issue Jan 19, 2022

feat(snowflake): upgrade to v2 (+ small tweaks) ooni/probe-cli#667

Merged

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 21, 2022

feat(ooniprobe): add torsf to experimental group

320e409

Reference issue: ooni/probe#1917

bassosimone mentioned this issue Jan 21, 2022

feat(ooniprobe): add torsf to experimental group ooni/probe-cli#670

Merged

bassosimone mentioned this issue Jan 24, 2022

feat: start adding torsf to desktop and mobile ooni/probe-cli#671

Merged

bassosimone mentioned this issue Jan 24, 2022

feat: add support for torsf ooni/probe-desktop#271

Merged

bassosimone mentioned this issue Jan 24, 2022

feat: add dnscheck to experimental ooni/probe-android#477

Merged

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 25, 2022

fix(i/t/tor.go): show correct command line

af17ea2

While there, ensure that we print a warning if we cannot find the correct tor binary. Work part of ooni/probe#1917.

bassosimone mentioned this issue Jan 25, 2022

fix(i/t/tor.go): show correct command line ooni/probe-cli#673

Merged

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 25, 2022

fix(i/t/tor.go): show correct command line (#673)

4d50dd6

While there, ensure that we print a warning if we cannot find the correct tor binary. Work part of ooni/probe#1917.

bassosimone added ooni/probe-cli ooni/probe-desktop ooni/probe-mobile issues related to OONI Probe mobile labels Jan 27, 2022

bassosimone mentioned this issue Jan 31, 2022

qa(torsf): figure out proper configuration to help snowflake devs collecting useful data #2004

Closed

bassosimone closed this as completed Jan 31, 2022

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022

fix(i/t/tor.go): show correct command line (ooni#673)

6e1d804

While there, ensure that we print a warning if we cannot find the correct tor binary. Work part of ooni/probe#1917.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torsf: testing on Android and Desktop #1917

torsf: testing on Android and Desktop #1917

bassosimone commented Dec 13, 2021 •

edited

Loading

bassosimone commented Jan 25, 2022 •

edited

Loading

skwowet commented Jan 26, 2022

bassosimone commented Jan 28, 2022 •

edited

Loading

bassosimone commented Jan 28, 2022 •

edited

Loading

bassosimone commented Jan 31, 2022

skwowet commented Jan 31, 2022

torsf: testing on Android and Desktop #1917

torsf: testing on Android and Desktop #1917

Comments

bassosimone commented Dec 13, 2021 • edited Loading

bassosimone commented Jan 25, 2022 • edited Loading

Bootstrap time issues

Diff 1: don't bootstrap from scratch

Diff 2: use the AMP cache

Diff 3: stop and retry during the test

Summary of tests so far

How to run similar tests

skwowet commented Jan 26, 2022

bassosimone commented Jan 28, 2022 • edited Loading

bassosimone commented Jan 28, 2022 • edited Loading

bassosimone commented Jan 31, 2022

skwowet commented Jan 31, 2022

bassosimone commented Dec 13, 2021 •

edited

Loading

bassosimone commented Jan 25, 2022 •

edited

Loading

bassosimone commented Jan 28, 2022 •

edited

Loading

bassosimone commented Jan 28, 2022 •

edited

Loading