Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torsf: testing on Android and Desktop #1917

Closed
bassosimone opened this issue Dec 13, 2021 · 6 comments
Closed

torsf: testing on Android and Desktop #1917

bassosimone opened this issue Dec 13, 2021 · 6 comments

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Dec 13, 2021

This issue is about performing QA of the torsf implementation on Desktop and mobile devices. We will close this issue when we have performed enough QA of the implementation to move forward and release. If there are significant issues, we can also consider releasing torsf only on some platforms rather than releasing it on all of them. The original plan was to do enough QA until release but it emerged a need to chat with Snowflake devs as it seems it may not be the best use of the next release to release a torsf that most often times out on mobile.

After the work done in #1717, we want to test how the new experiment is working on mobile devices, as well as on Desktop devices. To this end, we need to integrate together a cli that knows about torsf with a mobile app version and a Desktop version that knows about it.

Executive summary

Just see #2004

@bassosimone bassosimone self-assigned this Dec 13, 2021
bassosimone added a commit to ooni/probe-android that referenced this issue Dec 13, 2021
Say you have built an experimental `oonimkall.aar` artifact
from the `ooni/probe-cli` repo at whatever commit in whatever
branch. Say you want to test such an artifact against the
current `master` of the `ooni/probe-android` app.

How do you achieve that?

1. copy `oonimkall.aar` inside `./engine-experimental`;

2. `./gradlew assembleExperimentalFullRelease` (remember to
set `ANDROID_SDK_ROOT`);

3. locate the `apk` using `find` and then use `adb install` to
install such an APK on your phone.

I'm doing this work as part of ooni/probe#1917
to facilitate quickly testing of `torsf` and co-developing together
the `probe-cli` and `probe-android` repositories.
bassosimone added a commit to ooni/probe-android that referenced this issue Dec 14, 2021
Say you have built an experimental `oonimkall.aar` artifact
from the `ooni/probe-cli` repo at whatever commit in whatever
branch. Say you want to test such an artifact against the
current `master` of the `ooni/probe-android` app.

How do you achieve that?

1. copy `oonimkall.aar` inside `./engine-experimental`;

2. `./gradlew assembleExperimentalFullRelease` (remember to
set `ANDROID_SDK_ROOT`);

3. locate the `apk` using `find` and then use `adb install` to
install such an APK on your phone.

I'm doing this work as part of ooni/probe#1917
to facilitate quickly testing of `torsf` and co-developing together
the `probe-cli` and `probe-android` repositories.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 19, 2022
This diff contains the following changes and enhancements:

1. upgrade snowflake to v2

2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that;

3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing);

4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this);

5. ptx.go: avoid printing an error when the connection has just been closed;

6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled.

Related issues: ooni/probe#1845, ooni/probe#1894, and ooni/probe#1917.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 21, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 21, 2022
Reference issue: ooni/probe#1917.

I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.
bassosimone added a commit to ooni/probe-android that referenced this issue Jan 24, 2022
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: ooni/probe#1917
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 24, 2022
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: ooni/probe#1917
bassosimone added a commit to ooni/probe-desktop that referenced this issue Jan 24, 2022
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: ooni/probe#1917
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 24, 2022
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: ooni/probe#1917
bassosimone added a commit to ooni/probe-desktop that referenced this issue Jan 24, 2022
This version of the cli already includes support for torsf
and it's safe to test using this version knowing that the
release will consist of this version plus minor patches in
case we found any defect during the QA.

Reference issue: ooni/probe#1917
bassosimone added a commit to ooni/probe-android that referenced this issue Jan 24, 2022
This version of the cli already includes support for torsf
and it's safe to test using this version knowing that the
release will consist of this version plus minor patches in
case we found any defect during the QA.

With this change in, we're able to make test releases of
the Android app for the purpose of testing torsf.

Reference issue: ooni/probe#1917
@bassosimone
Copy link
Contributor Author

bassosimone commented Jan 25, 2022

So, here's an update regarding torsf testing. With @hellais, I have been testing torsf using v3.14.0-alpha.1, which currently is equivalent to master.

Bootstrap time issues

The main issue we've been facing so far has been the total time required to bootstrap tor. This has been, in particular, a problem on Android, where the default torsf config has timed out (after 600 s) in most runs.

We are confident that one of the main reasons why the bootstrap takes so much time is that we're bootstrapping tor from scratch (i.e., without a previously cached consensus). In everyday life, this condition should only arise the first time your bootstrap tor. Afterwards, tor will have cached microdescriptors and consensus.

Diff 1: don't bootstrap from scratch

Therefore, the first alternative configuration to try is one in which we use a fixed directory for torsf as opposed to a directory created in the operating-system's temporary directory (/tmp on Unix). The following diff implement this:

diff --git a/internal/engine/experiment/torsf/torsf.go b/internal/engine/experiment/torsf/torsf.go
index a96b466..6d1785b 100644
--- a/internal/engine/experiment/torsf/torsf.go
+++ b/internal/engine/experiment/torsf/torsf.go
@@ -122,7 +122,7 @@ func (m *Measurer) run(ctx context.Context,
 	tun, err := m.startTunnel()(ctx, &tunnel.Config{
 		Name:      "tor",
 		Session:   sess,
-		TunnelDir: path.Join(sess.TempDir(), "torsf"),
+		TunnelDir: path.Join(sess.TunnelDir(), "torsf"),
 		Logger:    sess.Logger(),
 		TorArgs: []string{
 			"UseBridges", "1",
diff --git a/internal/engine/mockable/mockable.go b/internal/engine/mockable/mockable.go
index 690a7c9..1893002 100644
--- a/internal/engine/mockable/mockable.go
+++ b/internal/engine/mockable/mockable.go
@@ -34,6 +34,7 @@ type Session struct {
 	MockableTempDir                  string
 	MockableTorArgs                  []string
 	MockableTorBinary                string
+	MockableTunnelDir                string
 	MockableUserAgent                string
 }
 
@@ -135,6 +136,11 @@ func (sess *Session) TorBinary() string {
 	return sess.MockableTorBinary
 }
 
+// TunnelDir implements ExperimentSession.TunnelDir.
+func (sess *Session) TunnelDir() string {
+	return sess.MockableTunnelDir
+}
+
 // UserAgent implements ExperimentSession.UserAgent
 func (sess *Session) UserAgent() string {
 	return sess.MockableUserAgent
diff --git a/internal/engine/session.go b/internal/engine/session.go
index 4efb4bf..eea5978 100644
--- a/internal/engine/session.go
+++ b/internal/engine/session.go
@@ -98,6 +98,9 @@ type Session struct {
 	// tunnel is the optional tunnel that we may be using. It is created
 	// by NewSession and it is cleaned up by Close.
 	tunnel tunnel.Tunnel
+
+	// tunnelDir is the directory where to create state for persistent tunnels.
+	tunnelDir string
 }
 
 // sessionProbeServicesClientForCheckIn returns the probe services
@@ -163,6 +166,7 @@ func NewSession(ctx context.Context, config SessionConfig) (*Session, error) {
 		tempDir:                 tempDir,
 		torArgs:                 config.TorArgs,
 		torBinary:               config.TorBinary,
+		tunnelDir:               config.TunnelDir,
 	}
 	proxyURL := config.ProxyURL
 	if proxyURL != nil {
@@ -204,6 +208,11 @@ func NewSession(ctx context.Context, config SessionConfig) (*Session, error) {
 	return sess, nil
 }
 
+// TunnelDir exports the directory for persistent tunnels.
+func (s *Session) TunnelDir() string {
+	return s.tunnelDir
+}
+
 // KibiBytesReceived accounts for the KibiBytes received by the HTTP clients
 // managed by this session so far, including experiments.
 func (s *Session) KibiBytesReceived() float64 {
diff --git a/internal/model/experiment.go b/internal/model/experiment.go
index 44b71ea..1e8a8a3 100644
--- a/internal/model/experiment.go
+++ b/internal/model/experiment.go
@@ -23,6 +23,7 @@ type ExperimentSession interface {
 	TempDir() string
 	TorArgs() []string
 	TorBinary() string
+	TunnelDir() string
 	UserAgent() string
 }
 

Diff 2: use the AMP cache

Another different configuration could be that of using the AMP cache as opposed to the normal rendezvous with the broker (henceforth, "rendezvous"). We implement this change in ooniprobe using:

diff --git a/internal/ptx/snowflake.go b/internal/ptx/snowflake.go
index bb6a913..190c4b2 100644
--- a/internal/ptx/snowflake.go
+++ b/internal/ptx/snowflake.go
@@ -83,18 +83,17 @@ func (d *SnowflakeDialer) ampCacheURL() string {
 	//
 	// So I disabled the AMP cache until we figure it out.
 	//
-	//return "https://cdn.ampproject.org/"
-	return ""
+	return "https://cdn.ampproject.org/"
 }
 
 // brokerURL returns a suitable broker URL.
 func (d *SnowflakeDialer) brokerURL() string {
-	return "https://snowflake-broker.torproject.net.global.prod.fastly.net/"
+	return "https://snowflake-broker.torproject.net/"
 }
 
 // frontDomain returns a suitable front domain.
 func (d *SnowflakeDialer) frontDomain() string {
-	return "cdn.sstatic.net"
+	return "www.google.com"
 }
 
 // iceAddresses returns suitable ICE addresses.

Diff 3: stop and retry during the test

No diff for now. This is another possible idea: if tor does not bootstrap after say 90 seconds, we can try another time within the same test. If the apps supported showing more than one measurement this would be even cooler. Though, we already have some support for emitting multiple measurements in probe-cli and perhaps we can thus use it.

Summary of tests so far

Here's the TL;DR regarding my tests so far. I repeated 40 runs of torsf with each possible configuration (there are four of them: rendezvous, rendezvous + reuse cache, AMP, AMP + reuse cache) from my macOS system.

Configuration Median bootstrap time
rendezvous 57 s
rendezvous + cache 7.8 s
amp 95 s
amp + cache 9.5 s

I'm now running some tests on mobile. Here's what I learned so far. The default configuration ("rendezvous") has been working on and off with iOS, with 600 s of timeout. The default configuration has always timed out for me on Android, but the "rendezvous + cache" configuration completed the bootstrap in reasonable time on the first run (103 s - a number that seems comparable to desktop) and since then has been predictably bootstrapped in little time.

How to run similar tests

You should clone github.com/ooni/probe-cli and be sure your're at master. The basic test is repeating ./miniooni torsf a bunch of times. Then you can get the run times by extracting them from report.jsonl (the torsf experiment emits a message when it's finished). A 600 s runtime means timeout, smaller mean success.

To build miniooni use go build -v ./internal/cmd/miniooni.

It would be useful to test the current version of master as well as combinations of the above patches.

bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 25, 2022
While there, ensure that we print a warning if we cannot find
the correct tor binary.

Work part of ooni/probe#1917.
bassosimone added a commit to ooni/probe-cli that referenced this issue Jan 25, 2022
While there, ensure that we print a warning if we cannot find
the correct tor binary.

Work part of ooni/probe#1917.
@skwowet
Copy link
Contributor

skwowet commented Jan 26, 2022

I repeated all the above-mentioned configurations multiple times (7-8) in my macOS system. Here's the TL;DR regarding my tests.

Configuration Median bootstrap time
rendezvous 31s
rendezvous + cache 154s
amp 70s
amp + cache 65s

Also, before running the tests make sure tor is installed in the system.

@bassosimone
Copy link
Contributor Author

bassosimone commented Jan 28, 2022

Here are the results of Android runs with the rendezvous configuration and 600s of timeout: 10/10 timeouts.

Other three tests: I bumped the timeout to 900s. First time: success in 45 s. Second and this time: timeout. Here's the logcat from one of these tests.

I think the following log line possibly means that tor gives up bootstrapping now (to be confirmed reading the src):

01-28 15:19:22.401 20606 20948 E GoLog   : Jan 28 15:19:22.000 [notice] Delaying directory fetches: No running bridges

(Here's another logcat I collected earlier this week that shows the same "Delaying directory fetches" message.)

I'm now going to run some tests with the cache configuration where we do the normal rendezvous but we also use the diff above to ensure that we're reusing the same tor cache directory in each experiment.

Run Bootstrap time
1 10.9 s
2 17.9 s
3 12.6 s
4 4.8 s
5 2.5 s
6 24.2 s
7 16.9 s
8 2.47 s

Here's a logcat of a working run in this configuration.

@bassosimone
Copy link
Contributor Author

bassosimone commented Jan 28, 2022

Here's a different attempt: what if we wipe the cache every time but we still put the cache inside the same directory in which the application stores persistent data as opposed to using the temporary directory?

Run Bootstrap time Note
1 12.8 s we still had a cache here [*]
2 900 s timeout surely no cache
3 900 s timeout surely no cache
4 900 s timeout surely no cache
5 900 s timeout surely no cache
6 900 s timeout surely no cache
7 900 s timeout surely no cache

[*] The first test could probably already have a cache? I installed this build of the app just after removing the previous build that was keeping the cache, so I'm wondering if this could be the problem.

@bassosimone
Copy link
Contributor Author

I'm going to close this issue now. I did my best to summarize the questions for Snowflake devs in #2004 and the original body of work related to QA has been completed anyway.

@skwowet
Copy link
Contributor

skwowet commented Jan 31, 2022

Here are the results of Android tests that I ran from India, with the rendezvous configuration and 600s of timeout.

Run Bootstrap time
1 27.5s
2 504
3 timeout_error
4 timeout_error
5 99.9s
6 timeout_error
7 timeout_error

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This diff contains the following changes and enhancements:

1. upgrade snowflake to v2

2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that;

3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing);

4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this);

5. ptx.go: avoid printing an error when the connection has just been closed;

6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled.

Related issues: ooni/probe#1845, ooni/probe#1894, and ooni/probe#1917.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
Reference issue: ooni/probe#1917.

I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: ooni/probe#1917
ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022
While there, ensure that we print a warning if we cannot find
the correct tor binary.

Work part of ooni/probe#1917.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants