-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa(torsf): figure out proper configuration to help snowflake devs collecting useful data #2004
Comments
(FTR I've mentioned this issue in the Snowflake issue tracker: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40097) |
Hey! This is really amazing work, thanks for the detailed writeup. I'm excited for when the test results eventually come in and what it can tell us about Snowflake reachability! I have several comments and a summary of answers to your 4 enumerated questions below.
This is just a slight nitpick on wording: We've been using rendezvous method to refer to different ways of contacting the broker. The current configuration in Tor Browser and the default configuration you're presenting here uses domain fronting as the rendezvous method. AMP cache is an alternative rendezvous method. So I'd recommend the following naming schemes for the different configurations you've shown here:
Just took a look at how you're using this. Your client configuration is essentially: ClientConfig{
BrokerURL: "https://snowflake-broker.torproject.net.global.prod.fastly.net/",
FrontDomain: "cdn.sstatic.net",
} This is correct and it's what Tor Browser is configured to use. The BrokerURL here isn't seen by the censor, it's included inside the TLS encrypted HTTP request to the front domain. The reason for using the fastly URL is to have traffic redirected to the right place. Our account at fastly has https://snowflake-broker.torproject.net.global.prod.fastly.net/ set up to forward traffic to https://snowflake-broker.torproject.net/.
This is a great question and you're right that caching on the client side will decrease the bootstrap time. It will also be hard to differentiate the measurements between first time clients and clients making cached connections. In my opinion, it's best to start small and eventually work our way up to more complex measurements if they are necessary. I would lean towards caching and focusing on learning about outright blocks of snowflake first before moving on to performance measurements. We have some ongoing work to improve snowflake performance and assess this using onionperf instances from various vantage points. While OONI would be a great resource in measuring Snowflake performance on mobile networks, we have some lower hanging fruit that we'd like to learn first and this can be learned without a full Tor bootstrap:
It's also the case that most users will be using cached tor states. So performance measurements with the cached state will still be interesting from that perspective.
Here's the line in question:
This is not actually an error and shouldn't be related to the bootstrap problem. This is a side effect of the firewalling that snowflake does on its OR port. The bridge directory authority does an OR port reachbility test when bridges join the network, and if the OR port is reachable, it will assign a 'running' flag to it. We frequently firewall this port for bridges that we do not want to hand out over BridgeDB or to make them less susceptible to probing attacks. It shouldn't actually interfere with the functionality of the bridge, but it does cause core tor to print out these messages.
[snip]
The results of mobile clients with a full uncached tor bootstrap are surprising to me as well. I wouldn't have expected the difference between cached and uncached bootstraps to be this extreme. What version of tor are you using here? It's possible you're running into a bug where bootstraps will hang indefinitely if done without a bridge fingerprint. I'm not sure this is the issue but it's worth digging into a bit.
We have noticed a variation in performance due to geographic location and also due to the NAT/networking setup of the client. This is something we're still trying to understand and map out but yes we can expect there to be considerable varation between devices at the moment. Now for the summary answers to your four questions:
All of the results look reasonable and expected to me except the mobile uncached results. I think it worth doing some debugging and digging into that a bit more if you're willing.
No, see my comment above: this is an unrelated side effect of firewalling the OR port at the bridge.
It would be really useful to us to do bootstraps using both the domain fronting method and the AMP cache method. We might add more rendezvous methods in the future and at that point it would be useful to test those as well!
I would rank the usefulness of different measurements as follows:
Let me know if I can clarify anything more! It's exciting to see this all come together! |
This diff contains significant improvements over the previous implementation of the torsf experiment. We add support for configuring different rendezvous methods after the convo at ooni/probe#2004. In doing that, I've tried to use a terminology that is consistent with the names being actually used by tor developers. In terms of what to do next, this diff basically instruments torsf to always rendezvous using domain fronting. Yet, it's also possible to change the rendezvous method from the command line, when using miniooni, which allows to experiment a bit more. In the same vein, by default we use a persistent tor datadir, but it's also possible to use a temporary datadir using the cmdline. Here's how a generic invocation of `torsf` looks like: ```bash ./miniooni -ODisablePersistentDatadir=true \ -ORendezvousMethod=amp \ -ODisableProgress=true torsf ``` (The default is `DisablePersistentDatadir=false` and `RendezvousMethod=domain_fronting`.) With this implementation, we can start measuring whether snowflake and tor together can boostrap, which seems the most important thing to focus on at the beginning. Understanding why the bootstrap most often does not converge with a temporary datadir on Android devices remains instead an open problem for now. (I'll also update the relevant issues or create new issues after commit this.) We also address some methodology improvements that were proposed in ooni/probe#1686. Namely: - we record the tor version because we include _some_ tor logs; - we include the bootstrap percentage because of the logs; - we set the anomaly key correctly. What remains to be done is the possibility of including Snowflake events into the measurement, which is not possible until the new improvements at common/event in snowflake.git are included into a tagged version of snowflake itself. (I'll make sure to mention this aspect to @cohosh in ooni/probe#2004.) It also remains to be done to measure the amount of bytes sent and received during the bootstrap, which will also probably be part of a follow-up diff (or even pull request). I also expect this diff to fail unit and integration tests, at least because of reduced coverage. This is fine because I plan to adding missing tests or fixing them as part of a follow-up diff. If you're reviewing this diff, I'd recommend focusing on (1) whether we're collecting good enough data for analysis and (2) whether the data we collect is safe to collect, or we should collect less to err more onto the safe side.
This diff adds dnscheck to experimental. Originally, in the related PR (#477) we also added support for torsf. But that has been backed out until we figure out exactly the correct configuration (ooni/probe#2004). Reference issue: ooni/probe#1973). Co-authored-by: Norbel Ambanumben <aanorbel@gmail.com>
This diff documents new options we have added to the torsf experiment after the ooni/probe#2004 discussion. The related probe-cli PR is: ooni/probe-cli#683
This diff documents new options we have added to the torsf experiment after the ooni/probe#2004 discussion. The related probe-cli PR is: ooni/probe-cli#683
It seems, in the grand scheme of things, this is the log we need. So we just introduced a regexp to extract it in ooni/probe-cli@bacab49. Part of ooni/probe#2004 and ooni/probe#1686
…683) This diff contains significant improvements over the previous implementation of the torsf experiment. We add support for configuring different rendezvous methods after the convo at ooni/probe#2004. In doing that, I've tried to use a terminology that is consistent with the names being actually used by tor developers. In terms of what to do next, this diff basically instruments torsf to always rendezvous using domain fronting. Yet, it's also possible to change the rendezvous method from the command line, when using miniooni, which allows to experiment a bit more. In the same vein, by default we use a persistent tor datadir, but it's also possible to use a temporary datadir using the cmdline. Here's how a generic invocation of `torsf` looks like: ```bash ./miniooni -O DisablePersistentDatadir=true \ -O RendezvousMethod=amp \ -O DisableProgress=true \ torsf ``` (The default is `DisablePersistentDatadir=false` and `RendezvousMethod=domain_fronting`.) With this implementation, we can start measuring whether snowflake and tor together can boostrap, which seems the most important thing to focus on at the beginning. Understanding why the bootstrap most often does not converge with a temporary datadir on Android devices remains instead an open problem for now. (I'll also update the relevant issues or create new issues after commit this.) We also address some methodology improvements that were proposed in ooni/probe#1686. Namely: 1. we record the tor version; 2. we include the bootstrap percentage by reading the logs; 3. we set the anomaly key correctly; 4. we measure the bytes send and received (by `tor` not by `snowflake`, since doing it for snowflake seems more complex at this stage). What remains to be done is the possibility of including Snowflake events into the measurement, which is not possible until the new improvements at common/event in snowflake.git are included into a tagged version of snowflake itself. (I'll make sure to mention this aspect to @cohosh in ooni/probe#2004.)
Thanks a lot for your detailed reply! 🙂 I'll reply inline to your comment and explain what changes we implemented thanks to the insights it provided. Here's a quick summary of the most important points and still-open questions:
See below for more detailed answers.
Thanks for educating me about the correct terminology! I have updated my mental model, the implementation and the spec to use the suggested vocabulary. (Since what the experiment actually does is setting While there, to ease experimentation, I made it possible in ooni/probe-cli#683 to select whether the use "amp" or "domain_fronting" and whether to enable/disable a "persistent data dir". The defaults we're using now are, respectively, "domain fronting" and enabling a "persistent data dir", as you recommended. Though, the possibility of changing this values w/ settings opens up the possibility of running further experiments (among which, one to clarify why
Thanks for ensuring that our config is correct!
Got it, thanks for clarifying how this works!
Thanks for helping us to choose the right approach here!
Awesome to see the "event channel" being implemented! I have started sketching out how OONI could use this functionality to collect these events here ooni/probe-cli#685. It's still incomplete, but I'll report back once I've finished coding and I am able to start using it. Thanks to you all for making this change! (I'd rather not release a version of OONI pinning to a commit, though, so I'd rather wait for the "event channel" to appear inside a release before including this functionality into a OONI reelease.)
Right, that's actually a good point in terms of capturing the average user experience! Didn't think about this!
Understood! Thank you for shedding light on the true meaning of such an error message!
AFAICT from https://github.com/ooni/go-libtor's README, we're using tor@d06bcf7672, authored on 2021-11-08. Judging from tor's tag history, this should be between tor 0.4.6.7 and tor 0.4.6.8.
Absolutely! I think it may be worth it to upgrading to the latest stable version of tor.
Understood, thank you!
Yes! Thanks a lot for confirming this was unexpected. I believe it's clear this is an oddity to look into.
🙏
On this note, would it be reasonable if we choose one bootstrap method at random, then? The improved implementation I added in ooni/probe-cli#683 uses "domain fronting" by default, but perhaps it seems more useful to you Snowflake developers if we randomize the bootstrap type?
Thanks a lot for clearly ranking all the measurements by their utility, thanks super useful!
No, everything was super clear, thanks! |
Actually, I would say that the domain fronting option is the most useful. Right now AMP cache is just a backup and isn't recommended as a configuration anywhere. So if we have to make a choice, I'd only test domain fronting and we'll update if needed.
Done! Should be in v2.1.0 |
Awesome, thanks for clarifying!
Thanks a lot! I've created a new issue for tracking this enhancement: #2017 |
The other remaining open issue is to figure out the long bootstrap time on mobile w/o persistent datadir, for which I opened a new issue at #2018. |
…oni#683) This diff contains significant improvements over the previous implementation of the torsf experiment. We add support for configuring different rendezvous methods after the convo at ooni/probe#2004. In doing that, I've tried to use a terminology that is consistent with the names being actually used by tor developers. In terms of what to do next, this diff basically instruments torsf to always rendezvous using domain fronting. Yet, it's also possible to change the rendezvous method from the command line, when using miniooni, which allows to experiment a bit more. In the same vein, by default we use a persistent tor datadir, but it's also possible to use a temporary datadir using the cmdline. Here's how a generic invocation of `torsf` looks like: ```bash ./miniooni -O DisablePersistentDatadir=true \ -O RendezvousMethod=amp \ -O DisableProgress=true \ torsf ``` (The default is `DisablePersistentDatadir=false` and `RendezvousMethod=domain_fronting`.) With this implementation, we can start measuring whether snowflake and tor together can boostrap, which seems the most important thing to focus on at the beginning. Understanding why the bootstrap most often does not converge with a temporary datadir on Android devices remains instead an open problem for now. (I'll also update the relevant issues or create new issues after commit this.) We also address some methodology improvements that were proposed in ooni/probe#1686. Namely: 1. we record the tor version; 2. we include the bootstrap percentage by reading the logs; 3. we set the anomaly key correctly; 4. we measure the bytes send and received (by `tor` not by `snowflake`, since doing it for snowflake seems more complex at this stage). What remains to be done is the possibility of including Snowflake events into the measurement, which is not possible until the new improvements at common/event in snowflake.git are included into a tagged version of snowflake itself. (I'll make sure to mention this aspect to @cohosh in ooni/probe#2004.)
This issue is about getting feedback from Snowflake developers on
torsf
. The super brief problem statement is that we're seeing tons of bootstrap issues when running on mobile. We'll consider this issue done when we'll have discussed the problem with Snowflake developers and figured out the best way to configuretorsf
in production.The structure of this issue is the following:
(Sadly, I did not manage to compress the information further.)
Problem statement
The
torsf
experiment bootstrapstor
using Snowflake (the logic is at torsf.go#107). We starttor
with command line options telling it to use as pluggable transport theooniprobe
client itself listening for SOCKS5 connections on a port (the mechanism is at tor.go#58). The port will forward traffic using Snowflake (the mechanism is at ptx.go#210).We adopted a
torsf
configuration where we use rendezvous with the broker and we use a new, temporary tor datadir every time, thus performing a cold bootstrap.With this configuration, we're having significant bootstrap timeout issues on mobile.
We've seen that changing the configuration makes the bootstrap more likely to succeed. It is unclear whether changing this configuration is leading us to produce useful results, though.
Hence, the need input from Snowflake developers to understand how to proceed.
Configurations
Let us call rendezvous the current configuration because it performs a rendezvous with the broker endpoint URL (we tested both "https://snowflake-broker.torproject.net.global.prod.fastly.net/" and "https://snowflake-broker.torproject.net/", which is the correct one? I suppose the first one for circumvention reasons, but maybe I'm missing something here?). Three other configurations of
torsf
are possible.The first alternative configuration is AMP. In this configuration we use the AMP cache instead of the rendezvous.
The second alternative configuration uses the rendezvous and uses a persistent directory for the
tor
data directory. This means that the first bootstrap is going to be cold. Subsequent bootstraps will have a (sometimes partial) cache of micro-descriptors already stored on the disk. As a result,tor
would need to exchange significantly less information in order to bootstrap. Given Snowflake's bandwidth constraints this seems to converge faster (we'll see the data later).The final alternative configuration uses AMP and a persistent
tor
data directory.To recap:
Discussion
The choice of whether to use AMP or the rendezvous may have an impact on the bootstrap time (we'll see measurements soon) and certainly has an implication in terms of censorship circumvention. The cache is most likely if not certainly making the bootstrap faster because
tor
needs to fetch less data over the (bandwidth constrained?) Snowflake.The key question however is what are we measuring? Do we want to measure the total time
tor
takes to bootstrap from scratch when using Snowflake? Do we want to measure whethertor
would bootstrap with Snowflake given a cache?When asking internally this question, we were conscious that choosing to use a cache will certainly be a problem in terms of making any statement regarding the bootstrap time.
Measurements
We tested
torsf
on Desktop and on mobile. The original issue describing the measurements is #1917. In this issue I'll try to just summarize the most relevant results of analyzing the measurements.Our repeated desktop measurements results are summarized by the following table (40 repetitions):
So, I would conclude from this data that cache really makes a significant difference (of course, once it's filled), while AMP may have slightly worst performance but they still in the domain of "comparable" results.
Mobile measurements, though, are extremely more problematic. Here's a table with results on Android:
What is interesting, if we read the logcat is that
tor
says "Delaying directory fetches: No running bridges". If think this could mean thattor
will try continuing the bootstrap at a later time. So, I think that after this message the bootstrap should be considered failed. Now, the obvious question to ask to Snowflake developers is whether this assumption is true.Interestingly, with caching enabled, I got these results:
(I also tried to put the temporary cache in the app-specific directory rather than in the temporary per-app space, under the assumption that the temporary area was too slow, but actually nothing really changed.)
As an extra data point: a OONI user who helped us testing these patches, @yeganathan18, reported that the rendezvous configuration was bootstrapping more frequently than it did for us (3 times out of 7) in measurements he run in India. This result was quite puzzling to me, since I did not expect to see variability depending on the geographic location and I would have expected this person to see mostly timeouts like I did. (Should I have expected it?)
Other measurements from other countries, though, confirmed that our default configuration does not often bootstrap with a 600 s timeout. OTOH, those measurements also show the cache helping a lot.
Questions for Snowflake developers
Do these mobile performance with and without caching match your experience?
Is it correct to say that after
tor
says "Delaying directory fetches: No running bridges" it's basically game over and the bootstrap will not converge untiltor
decides to try handshaking again? (And this until is certainly longer than the maximum time we're willing to wait for an interactive OONI experiment?)Do you think we should be measuring by default using AMP or using the rendezvous mechanism? That is, which data point would be more useful to you? Should we do both together? Should we choose at random? (Of course, I think we should also include data about the mechanism being used in the measurement, otherwise it's pointless)
Assuming the answer to question 1 is that these results we see are expected, what is the most useful measurement we can implement for you? Is it more useful to know that the Snowflake-assisted bootstrap times out often or is it more useful to know that we could bootstrap using Snowflake although the cache makes the bootstrap time more difficult to compare?
The text was updated successfully, but these errors were encountered: