-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RiseupVPN tests improvements #1125
base: master
Are you sure you want to change the base?
Conversation
Hm... the coverage CI action wasn't failing before, when I played with the code on my personal github repo and filed a MR to myself. From the above failing report - RiseupVPN's coverage: |
Hi @bassosimone @hellais, is there anything I could help with to support the review process? |
0312625
to
e3561a1
Compare
@bassosimone I've rebased the MR to that latest master branch. I'm sure you know that: to test the MR, you'll need to build mini-ooni. proble-cli tests will fail without an redirection in go.mod or until the changes landed on master. We would love to see this MR getting in soon, it will help us a lot with improving measurements for LEAP VPN / RiseupVPN. |
@cyBerta I will take care of doing a review of this MR! Thank you for updating riseupvpn! |
Thank you for your patience @cyBerta! I am currently finishing the review, running tests, and generally think about this experiment and how we can move it forward! I estimate I'll publish the review next week. 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for improving the riseupvpn experiment! I started a detailed code review and tested the old and the new implementations side by side for a few hours. I did not finish my review because I realized that there was a more fundamental issue that we needed to discuss.
While testing I noticed that fetching eip-service.json and the geo service JSON has been very unreliable across my testing. It happened to me several times that the connection timed out when trying to connect to api.black.riseup.net. Your diff calls these APIs at lines 249 and 255 respectively. You process the results by calling the UpdateProviderAPITestKeys method on line 264. This method sets the APIStatus as blocked on line 114 if and only if the URL was that of the eip-service.json. In turn, this causes the whole measurement to be flagged as anomaly on line 493. If I was a user and I had run and submitted those measurements, I would have believed that riseupvpn was blocked, while the underlying cause was the flaky service serving eip-service.json. If this event occurred so frequently that I could notice it, I am afraid it could have also happened for several other users. And we previously disabled riseupvpn because of too frequent false positives.
By reading the diff, I see how you did great work to try to reduce the frequency of false positives, which incidentally complicated the implementation and added extra complexity that we would need to maintain. At this point, I am convinced that we’re fighting an uphill battle and I want to instead propose a different approach that should make both my life (as the probe-cli maintainer) and your life (as the experiment maintainer) significantly easier. What’s more, my proposal would help expedite the review and reintegration of the riseupvpn experiment into ooniprobe.
Before explaining my proposal, I would like to explain what, in my opinion, the goal of this experiment should be, both now and in the next two years. We would like to collect data useful to understand whether (1) riseupvpn endpoints are accessible and (2) the riseupvpn app could bootstrap by fetching the data it needs to do so. We are much less concerned with providing immediate feedback to users than we are about collecting data that can be useful to you folks to plan and reason about the global VPN service you provide.
For this reason, here’s what I propose to do in the short term:
-
We drop all the top-level keys from the experiment and we only keep the results generated by urlgetter. At the same time, we simplify the code that generates the summary so that it always says that there was no anomaly. This change removes the possibility that individual users would be triggered by flaky behavior of api.black.riseup.net and shifts the focus on the experiment purely on collecting data that should only be analyzed at scale (like we do for dnscheck).
-
We change the logic of the experiment such that, if any of the APIs required to bootstrap fail, we stop and submit a smaller JSON to the backend. This change removes the extra complexity that was already there and the additional one you introduced in this commit. Arguably, this would also make the experiment less complete than the one you implemented in this pull request, but I also have a plan to address that. However, this plan is not for the short term, but luckily it intersects with my other work commitments. I expand on this plan in detail below.
-
We modify the code to avoid submitting the JSON returned by the geo service, given that it includes geolocation data that it would be better not to submit. Granted, there is a mechanism in place with ooniprobe that strips the user’s IP address from the measurement. But I can see a bunch of cases where different geolocation mechanisms (the one used by OONI and the one used by riseup) may not agree because the IP address is different for some reasons and I would not like for us to increase the likelihood of a leak.
With these three changes in place, the riseupvpn experiment would be smaller and easier to maintain. So, we can add it back to ooniprobe in the experimental suite. By running it again, we can collect more data at scale that is certainly useful to you folks. At the same time, we do not need to worry anymore about false positives because now we focus on collecting data.
In the medium term, I am going to propose the following:
-
We are developing a mechanism by which the OONI backend can serve to ooniprobe a DSL describing what a specific network experiment needs to do. We are planning on using this mechanism for selected experiments (surely dnscheck and the IM nettests). I am suggesting we apply this mechanism also to riseupvpn. You can see a complete prototype of how this would work implemented in ooni/2023-05-richer-input#12 and documented at probe#2502.
-
When riseupvpn is rewritten to use the DSL, the backend will periodically invoke the relevant riseupvpn APIs to fetch gateways to test. The backend will then serve to ooniprobes a DSL that contains instructions to measure the services required to bootstrap and the gateways. Because the backend generates the DSL, we have now removed the functional coupling between the first part of the riseupvpn experiment (bootstrap) and the second part (gateways). These two parts can now run independently of each other with moderate parallelism. Yay for simplicity!
-
The DSL will include a feature-flag mechanism allowing the selection of the best operation to perform depending on the probe capabilities. Once @ainghazal has landed minivpn support in ooniprobe, we will then be able to ask the ooniprobes that can do that, to perform the openvpn handshake. More generally, the DSL will decouple the definition of an experiment to the probe version out there, thus requiring fewer emergency releases.
In the long term (with all the usual caveats borrowed from J.M. Keynes), I suggest that we evaluate again the decision to score measurements locally based on the collected data. By then, incidentally, I suspect we would have migrated to a model where the OONI collector provides the data analysis and scoring for a measurement, which would further decouple releasing a new version of ooniprobe and tweaking data analysis.
What do you think?
Longitude float64 `json:"lon"` | ||
Gateways []string `json:"gateways"` | ||
SortedGateways []GatewayLoad `json:"sortedGateways"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not include this body into the measurement as explained in the whole review.
@@ -29,13 +29,17 @@ type EipService struct { | |||
Gateways []GatewayV3 | |||
} | |||
|
|||
// Capabilities is a list of transports a gateway supports | |||
type Capabilities struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that all the data structures belonging to EipServiceV3, so maybe add V3 everywhere?
Wow, Thanks for this extensive review! I appreciate a lot the time you took for it. I can definitely change the given MR according to your short term proposals, they sound reasonable to me. Also awesome that you apparently already started to work on the midterm proposals as I understand ooni/probe#2502. |
Thank you, @cyBerta ! |
f0b8ca8
to
7aec078
Compare
…r the ca was invalid. Instead of early returning the test, we can still gather the necessary configs to run the gateway tests. This change implies that there might occur multiple failures while fetching data from the API, which is why TestKeys.APIFailures becomes a string array.
…I was considered to be blocked
… longer skipped, API config data is fetched even if CA cert is invalid or missing
…sting, avoid those known to be overloaded.
…ducing the amount of tested gateways
…4 bridges, avoid hard-coding locations in the test
…sideration for APIStatus blocked
* never return IsAnomaly=true * early return in case the communication to the API fails * remove (most) top level TestKeys, including custom SummaryKeys and keep urlgetter keys * however keeps invalid CA cert handling, since unattended expired certs was a root cause for false positives in the past * removes snowflake + tor handling as part of the complexity reduction of the test * remove API response bodys from testkeys, addresses also removal of geolocation API reponse * adapts unit tests
005081e
to
c3264d8
Compare
ok, I think I've addressed your proposals.
I've left 2 TestKeys that helps within the test to use tls-no-verify in case fetching the CA cert fails or the CA is expired. Next to the slow response time of Riseup's API server it was a common case for false positives. Also we're using these keys in a fetch-and-parsing pipeline for prometheus, so keeping these keys would help us to change less code at other places as well.
I've removed the summary keys and the tests always return false for IsAnomaly
yey, should be the case now
I've changed the test so that it fails and returns early.
while I was at it I've removed all response bodies, since they aren't interesting for us and they unnecessarily leak data |
Removing the summary keys was discussed in #1125 and I'm extracting this diff form such a pull request. Co-authored-by: cyBerta <cyberta@riseup.net>
Removing the summary keys was discussed in #1125 and I'm extracting this diff form such a pull request. While there, bump the version number for this and other upcoming changes. --------- Co-authored-by: cyBerta <cyberta@riseup.net>
This diff renames some structs, changes the code emitting the progress, and bumps the experiment version. In my previous commit, I said I did bump the version number but that was not actually the case. This diff has been extracted from #1125. This diff is part of ooni/probe#1432
…1361) This diff renames some structs, changes the code emitting the progress, and bumps the experiment version. In my previous commit, I said I did bump the version number but that was not actually the case. This diff has been extracted from #1125. This diff is part of ooni/probe#1432 --------- Co-authored-by: cyBerta <cyberta@riseup.net>
6270793
to
c3264d8
Compare
I mistakenly pushed some of my commits in this branch, but I reverted it now. |
This diff incorporates part of what has been implemented by @cyBerta in #1125 in response to my review. I have omitted the work to use the geo service to figure out which are the correct gateways to test for now, which allows me to simplify the diff that I am committing to the master branch. The spirit of the changes is the following: 1. reduce the test keys to the minimum required by riseup to process the experiment results; 2. skip TLS verification if we cannot fetch the CA certificate and unconditionally fetch information about gateways. The major difference between this diff and the above-mentioned pull request is that I did not feel good to keep the `api_failure` name with a different type, which seems like a very breaking change, and instead I introduced a new type called `api_failures`. This work is part of ooni/probe#1432.
This diff incorporates part of what has been implemented by @cyBerta in #1125 in response to my review as well as additional changes based on my own feelings about what is correct to do here. Compared to the original diff, these are the changes that I implemented: 1. I have omitted the work to fetch from riseup geo service and figure out the correct gateways to test. The main reason for not including this body of work has been to reduce the size of the diff and the amount of code to deal with. 2. I modified the logic related to failures in fetching the CA and communicating with riseup services. The test fails immediately if we cannot fetch the proper CA or we cannot contact riseup services. I did not feel comfortable disabling the CA to access riseup services and connecting to the TCP endpoints discovered w/o CA verification. 3. In the test keys, I renamed `api_failure` to `api_failures` because I do not think it's optimal to keep the same name while the type has changed from `*string` to `[]string`. The spirit of the changes is not directly compatible with what we discussed with @cyBerta. The main difference is in my decision to fail early in case we miss the preconditions. As I wrote in #1125 (review), I think we should be using richer input (and start with its simplest form) to provision to probes the data they need to perform this experiment. By provisioning the data ourselves, we remove the coupling between getting the CA, accessing riseup services to get information on what gateways we should measure, and measure the gateways, which makes the experiment several orders of magnitude more robust. Unfortunately, I do not have time, in this cycle, to perform all this richer input work. We'll try again for 3.20. This work is part of ooni/probe#1432. While there, I forced null callbacks when performing the CA fetch and contacting riseup services, otherwise we end up printing a non-monotonic progress status. Admittedly, also omitting to provide progress about these two operations is bad, but I think we won't be able to provide monotonic progress until we know what we should fetch in advance. --------- Co-authored-by: cyBerta <cyberta@riseup.net>
@cyBerta I started merging code from this PR into the master branch. The related PRs are #1360, #1363, and #1364. #1363 (comment) in particular explains my rationale for pulling changes and what I chose not to include for the 3.19 release. It also explains how I think we can use some form of richer input, such as the DSL described above, or perhaps something simpler (and easier to implement), to decouple testing services useful to bootstrap riseupvpn and getting the endpoints we should be testing. I think this PR should probably remain open because it contains the work about finding out which subset of the gateways to test, which I think in the future should be an algorithm that runs in the backend to find out which gateways probes should test. If you have no objections, I'd like to go ahead and merge your changes with master, so it's more clear what has not been added into master and the algorithms about parsing the geo service results stand out more clearly. Thank you! |
… version This diff backports #1361 to the release/3.19 branch. This diff renames some structs, changes the code emitting the progress, and bumps the experiment version. In my previous commit, I said I did bump the version number but that was not actually the case. This diff has been extracted from #1125. This diff is part of ooni/probe#1432 --------- Co-authored-by: cyBerta <cyberta@riseup.net>
…keys This diff backports #1363 to the release/3.19 branch. This diff incorporates part of what has been implemented by @cyBerta in #1125 in response to my review as well as additional changes based on my own feelings about what is correct to do here. Compared to the original diff, these are the changes that I implemented: 1. I have omitted the work to fetch from riseup geo service and figure out the correct gateways to test. The main reason for not including this body of work has been to reduce the size of the diff and the amount of code to deal with. 2. I modified the logic related to failures in fetching the CA and communicating with riseup services. The test fails immediately if we cannot fetch the proper CA or we cannot contact riseup services. I did not feel comfortable disabling the CA to access riseup services and connecting to the TCP endpoints discovered w/o CA verification. 3. In the test keys, I renamed `api_failure` to `api_failures` because I do not think it's optimal to keep the same name while the type has changed from `*string` to `[]string`. The spirit of the changes is not directly compatible with what we discussed with @cyBerta. The main difference is in my decision to fail early in case we miss the preconditions. As I wrote in #1125 (review), I think we should be using richer input (and start with its simplest form) to provision to probes the data they need to perform this experiment. By provisioning the data ourselves, we remove the coupling between getting the CA, accessing riseup services to get information on what gateways we should measure, and measure the gateways, which makes the experiment several orders of magnitude more robust. Unfortunately, I do not have time, in this cycle, to perform all this richer input work. We'll try again for 3.20. This work is part of ooni/probe#1432. While there, I forced null callbacks when performing the CA fetch and contacting riseup services, otherwise we end up printing a non-monotonic progress status. Admittedly, also omitting to provide progress about these two operations is bad, but I think we won't be able to provide monotonic progress until we know what we should fetch in advance. --------- Co-authored-by: cyBerta <cyberta@riseup.net>
} | ||
testkeys.Requests = scrubbedRequests | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding scrubbing, I should mention here that since 7a6fd91 we should be unconditionally scrubbing HTTP headers and bodies by removing anything that looks like an IP address or endpoint.
This diff relaxes the riseupvpn scoring logic because we have determined in ooni/probe-cli#1125 (comment) that we want this experiment to just collect data useful to riseup. Closes #745.
This diff relaxes the riseupvpn scoring logic because we have determined in ooni/probe-cli#1125 (comment) that we want this experiment to just collect data useful to riseup. Note that I needed to change test_aggregation.py as a result of these changes because the number of ok and anomaly results changed. Closes #745.
This diff relaxes the riseupvpn scoring logic because we have determined in ooni/probe-cli#1125 (comment) that we want this experiment to just collect data useful to riseup. Note that I needed to change test_aggregation.py as a result of these changes because the number of ok and anomaly results changed. Closes #745.
hi @bassosimone |
Removing the summary keys was discussed in ooni#1125 and I'm extracting this diff form such a pull request. While there, bump the version number for this and other upcoming changes. --------- Co-authored-by: cyBerta <cyberta@riseup.net>
…oni#1361) This diff renames some structs, changes the code emitting the progress, and bumps the experiment version. In my previous commit, I said I did bump the version number but that was not actually the case. This diff has been extracted from ooni#1125. This diff is part of ooni/probe#1432 --------- Co-authored-by: cyBerta <cyberta@riseup.net>
…1363) This diff incorporates part of what has been implemented by @cyBerta in ooni#1125 in response to my review as well as additional changes based on my own feelings about what is correct to do here. Compared to the original diff, these are the changes that I implemented: 1. I have omitted the work to fetch from riseup geo service and figure out the correct gateways to test. The main reason for not including this body of work has been to reduce the size of the diff and the amount of code to deal with. 2. I modified the logic related to failures in fetching the CA and communicating with riseup services. The test fails immediately if we cannot fetch the proper CA or we cannot contact riseup services. I did not feel comfortable disabling the CA to access riseup services and connecting to the TCP endpoints discovered w/o CA verification. 3. In the test keys, I renamed `api_failure` to `api_failures` because I do not think it's optimal to keep the same name while the type has changed from `*string` to `[]string`. The spirit of the changes is not directly compatible with what we discussed with @cyBerta. The main difference is in my decision to fail early in case we miss the preconditions. As I wrote in ooni#1125 (review), I think we should be using richer input (and start with its simplest form) to provision to probes the data they need to perform this experiment. By provisioning the data ourselves, we remove the coupling between getting the CA, accessing riseup services to get information on what gateways we should measure, and measure the gateways, which makes the experiment several orders of magnitude more robust. Unfortunately, I do not have time, in this cycle, to perform all this richer input work. We'll try again for 3.20. This work is part of ooni/probe#1432. While there, I forced null callbacks when performing the CA fetch and contacting riseup services, otherwise we end up printing a non-monotonic progress status. Admittedly, also omitting to provide progress about these two operations is bad, but I think we won't be able to provide monotonic progress until we know what we should fetch in advance. --------- Co-authored-by: cyBerta <cyberta@riseup.net>
Checklist
Description
This MR aims to improve the data quality of RiseupVPN tests by reduce the amount of false positives due to server side issues.
Major changes are: