wcth: compare old and new test helpers #1707

bassosimone · 2021-07-06T17:00:46Z

This issue is about fetching all the URLs in all the test lists with both helpers and comparing the results. We want to understand in which cases the new test helper differs from the old test helper and why.

You may want to jump straight to the conclusions: #1707 (comment).

When a probe gets a local DNS failure, it will continue and nonetheless query the test helper without any IP address, just an empty list. This diff fixes the behavior of cmd/oohelper to do the same. Work part of ooni/probe#1707.

bassosimone · 2021-09-13T11:42:51Z

Here's a status update. In late July I run a first pass comparison of the whole content of the test lists between the old and the new test helper.

To this end, I used the code at https://github.com/ooni/wcthcheck.

The codename of this first comparison is config-0. Running the ./compare tool of the above mentioning repository using config-0, we get:

{
  "dns/match/exactly": 19518,
  "dns/match/same_asn": 3362,
  "dns/match/same_org": 122,
  "dns/mismatch/cannot_map_to_asn": 6,
  "dns/mismatch/new_data_none_addrs": 47,
  "dns/mismatch/other": 431,
  "dns/total": 23486,

  "http_body_length/diff": 9333,
  "http_body_length/diff/new_th_larger": 6368,
  "http_body_length/diff/new_th_smaller": 2965,
  "http_body_length/total": 23480,

  "http_failure/match": 22363,
  "http_failure/mismatch": 1117,
  "http_failure/mismatch/new_is_none": 45,
  "http_failure/mismatch/old_is_none": 292,
  "http_failure/mismatch/other": 780,
  "http_failure/total": 23480,

  "http_headers/match/same_set": 19516,
  "http_headers/mismatch/new_is_none": 1037,
  "http_headers/mismatch/set_diff": 2927,
  "http_headers/total": 23480,

  "http_status_code/match": 21916,
  "http_status_code/mismatch": 1564,
  "http_status_code/total": 23480,

  "http_title/match/both_empty": 2502,
  "http_title/match/equal": 11180,
  "http_title/mismatch/new_empty": 749,
  "http_title/mismatch/old_empty": 935,
  "http_title/mismatch/old_th_smaller": 7683,
  "http_title/mismatch/other": 431,
  "http_title/mismatch/total": 8114,
  "http_title/total": 23480,

  "processing/match/has_both_measurements": 23486,
  "processing/mismatch/missing_newth_measurement": 7,
  "processing/mismatch/missing_oldth_measurement": 4478,
  "processing/succeeded": 27971,
  "processing/total": 27971,

  "tcp_connect/match": 23251,
  "tcp_connect/mismatch": 235,
  "tcp_connect/mismatch/added_ivp4": 60,
  "tcp_connect/mismatch/different_ivp4": 185,
  "tcp_connect/mismatch/removed_ivp4": 73,
  "tcp_connect/total": 23486
}

After this initial scanning, I started trying to figure out the reasons why there was such a difference. Generally speaking, what I did was to look into a subset of measurements to identify bugs.

The first issue I figured out is that is that the internal/cmd/oohelper command in ooni/probe-cli was stopping when the input URL's domain name was NXDOMAIN. This was fixed at ooni/probe-cli#450.

After which I started looking into the subset of URLs that presents tcp_connect mismatches. This subset of URLs has been used for config-1, config-2, config-3 and config-4 runs.

For this new comparison, I deployed the legacy backend on my local machine inside a Docker container running an ubuntu:16.04 container. To do that smoothly I needed ooni/backend#533 to pin working deps.

While investigating, there was some serendipity. I assumed the legacy backend was using 8.8.8.8:53/udp as a resolver, but instead it was using the system resolver. This led me to #1780.

Subsequent changes to the new test helper landed at: ooni/probe-cli#504.

This brings us to today. Here's the next-two-weeks agenda:

we are able to explain all the differences and to minimise them through code changes
we compare the error strings emitted for specific error conditions

The latter will be done by censoring the network on the computer I am using via Jafar.

See ooni/probe#1707

Part of ooni/probe#1707

bassosimone · 2021-09-13T12:00:04Z

Written status update. We can now move this issue to Sprint 48.

See ooni/probe#1707

Part of ooni/probe#1707

See ooni/probe#1707 (comment)

bassosimone · 2021-10-15T15:20:07Z

Further investigating DNS differences (5/5)

I solved more DNS related issues and branched off an issue I could not solve easily: #1823.

This is the current DNS situation for most ~80% URLs in the test list:

{
  "dns/match/exactly": 25642,
  "dns/match/same_asn": 158,
  "dns/match/same_failure": 552,
  "dns/mismatch/different_failure": 1,
  "dns/mismatch/old_failure": 19,
  "dns/mismatch/other": 5,
[snip]

The different_failure case is the one for which I branched off the new issue. I still need to chase the old_failure and other cases to figure out what is their meaning.

So, most of them were transient, we're not looking at this:

  "dns/known_bug/old_cannot_handle_ip": 17,
  "dns/match/exactly": 25824,
  "dns/match/same_asn": 160,
  "dns/match/same_failure": 552,
  "dns/mismatch/different_failure": 3,
  "dns/mismatch/new_failure": 4,
  "dns/mismatch/old_failure": 1,

There are 17 entries were the old TH cannot handle IPs in the URL, which I think it's a behaviour that the new TH does not actually want to mirror. Then there are 8 failures that maybe we need to further investigate. Overall, though, the DNS part seems to be converging (I am continuing to scan the TLs but now I am near to cases where probably the domains are censored where I live because every measurement is taking such a long time).

bassosimone · 2021-10-15T17:27:28Z

HTTP headers

I also started looking into HTTP headers. I found more cases in which the set is disjoint, so I added more headers as exception, but then I also measured in which cases both sets ended up being empty. This is not looking good:

  "http_headers/failure/new": 883,
  "http_headers/failure/old": 5141,
  "http_headers/match/no_header_left": 9378,
  "http_headers/match/same_set": 9060,
  "http_headers/mismatch/set_diff": 925,
  "http_headers/total": 19363,

There are two issues here: (1) we have many HTTP failures for which we probably need to retry and (2) the more I add exception headers the more I increase the no_header_left. That is: set_diff and no_header_left cannot both be optimised at the same time. So perhaps the approach I am using for headers here is wrong. I think we should just eliminate the headers that the probe would not consider without adding further exceptions. Doing that now.

Okay, so this is what we have:

  "http_headers/failure/new": 885,
  "http_headers/failure/old": 5142,
  "http_headers/match/no_header_left": 4064,
  "http_headers/match/same_set": 4975,
  "http_headers/mismatch/set_diff": 10331,
  "http_headers/total": 19370,

This is, in itself, a data quality issue. The two different implementations return different headers that matter according to Web Connectivity's heuristics in many cases. This confuses new probes a bit (unclear the extent). If we switch to the new TH, we now confuse new probes less and old probes more (which is less of a concern).

So, okay, I think there is no need to dig this headers topic further. It's clear we cannot do much about this. The two implementations we are using are too different and too spaced in time for further action to be possible.

bassosimone · 2021-10-15T17:41:28Z

HTTP status code (1/2)

Regarding the status code, after correcting to avoid noise caused by failures, here's what I get:

  "http_status_code/failure/new": 885,
  "http_status_code/failure/old": 5142,
  "http_status_code/match": 19030,
  "http_status_code/mismatch": 343,
  "http_status_code/total": 25400,

It may be worth taking a look into the mismatch cases and see whether there's anything fundamental here.

So, the first thing I noticed is that with the old TH there are URLs that return 403 and this becomes 200 with the new TH. An example is http://www.geocities.ws/marchrecce/, for which we have:

// old TH

  "http_request": {
    "body_length": 11784,
    "failure": null,
    "title": "Please Wait... | Cloudflare",
    "headers": {
        // snip
    },
    "status_code": 403

// new TH

  "http_request": {
    "body_length": 6573,
    "failure": null,
    "title": "Rendezvous en Lower Myanmar - An Inner Odyssey",
    "headers": {
        // snip
    },
    "status_code": 200

I created rules for both the 403 that becomes 200 and for the sub-case of Cloudflare and now I have:

  "http_status_code/403_becomes_200": 213,
  "http_status_code/cf": 6,
  "http_status_code/failure/new": 924,
  "http_status_code/failure/old": 5349,
  "http_status_code/match": 19420,
  "http_status_code/mismatch": 132,
  "http_status_code/total": 26044,

So, clearly the Cloudflare check was not very selective and the 403 => 200 was much more selective.

Okay, this is a data quality issue we're fixing by using the new TH ✔️

Now, what's next? we need to look into the remaining cases for which we have mismatch.

After more refinements, here's what I have:

  "http_status_code/200_becomes_503": 13,
  "http_status_code/308_becomes_200": 46,
  "http_status_code/403_becomes_200/cf": 6,
  "http_status_code/403_becomes_200/other": 217,
  "http_status_code/403_becomes_503/cf": 1,
  "http_status_code/403_becomes_503/other": 16,
  "http_status_code/503_becomes_200": 9,
  "http_status_code/failure/new": 935,
  "http_status_code/failure/old": 5474,
  "http_status_code/match": 19699,
  "http_status_code/mismatch": 49,
  "http_status_code/total": 26465,

Some insights based on the above results:

we should retry all the cases in which 200 becomes 503 or 503 becomes 200, because this looks like a temporary failure that will go away if we retry and maybe our testing was a bit too aggressive
in addition to 403 becoming 200 we also have 403 becoming 503, which are both data quality issues
we still have 49 cases to dig into

One of the remaining cases (http://www.irna.ir/en/) ~~is quite fascinating~~ (edit: is a fundamental issue; see #1824):

// old TH

{
  "tcp_connect": {
    "178.216.249.78:80": {
      "status": true,
      "failure": null
    },
    "217.25.48.64:80": {
      "status": true,
      "failure": null
    }
  },
  "http_request": {
    "body_length": 35805,
    "failure": null,
    "title": "IRNA English",
    "headers": {
      "Cache-Control": "max-age=60",
      "Content-Type": "text/html;charset=UTF-8",
      "Date": "Fri, 15 Oct 2021 03:11:46 GMT",
      "Expires": "Fri, 15 Oct 2021 03:12:06 GMT",
      "Server": "nginx",
      "Vary": "Accept-Encoding",
      "X-Cache-Status": "HIT"
    },
    "status_code": 200
  },
  "dns": {
    "failure": null,
    "addrs": [
      "217.25.48.64",
      "178.216.249.78"
    ]
  }
}

// New TH

{
  "tcp_connect": {
    "178.216.249.78:80": {
      "status": true,
      "failure": null
    },
    "217.25.48.64:80": {
      "status": true,
      "failure": null
    }
  },
  "http_request": {
    "body_length": 1918,
    "failure": null,
    "title": "\u0635\u0641\u062d\u0647\u0654 \u062f\u0631\u062e\u0648\u0627\u0633\u062a\u06cc \u0634\u0645\u0627 \u06cc\u0627\u0641\u062a
 \u0646\u0634\u062f.",
    "headers": {
      "Content-Language": "en",
      "Content-Length": "1918",
      "Content-Type": "text/html;charset=UTF-8",
      "Date": "Fri, 15 Oct 2021 03:11:34 GMT",
      "Server": "nginx"
    },
    "status_code": 404
  },
  "dns": {
    "failure": null,
    "addrs": [
      "217.25.48.64",
      "178.216.249.78"
    ]
  }
}

I'm going to try and see whether there are more cases like this. (I didn't really see this coming!)

Okay, after some refactoring, this is the final analysis for status code. I'll add some comments inline:

  // These are cases where apparently using the new TH causes failures
  "http_status_code/200_becomes_403": 8,
  "http_status_code/200_becomes_404": 2,
  "http_status_code/200_becomes_500": 1,
  "http_status_code/200_becomes_502": 1,
  "http_status_code/200_becomes_503": 13,
  "http_status_code/200_becomes_504": 1,

  // This 203 is interesting
  "http_status_code/203_becomes_504": 1,

  // These are cases where the old TH does not handle 308
  "http_status_code/308_becomes_200": 46,
  "http_status_code/308_becomes_203": 1,
  "http_status_code/308_becomes_404": 2,
  "http_status_code/308_becomes_451": 1,

  // These seem cases where the old TH causes data quality issues to new probes
  // and probably it is somehow classified as "very old client"
  "http_status_code/400_becomes_200": 5,
  "http_status_code/400_becomes_404": 1,
  "http_status_code/403_becomes_200/cf": 6,
  "http_status_code/403_becomes_200/other": 217,
  "http_status_code/403_becomes_402": 1,
  "http_status_code/403_becomes_404": 8,
  "http_status_code/403_becomes_503/cf": 1,
  "http_status_code/403_becomes_503/other": 16,
  "http_status_code/404_becomes_200": 3,
  "http_status_code/404_becomes_503": 2,
  "http_status_code/429_becomes_200": 1,
  "http_status_code/500_becomes_200": 4,
  "http_status_code/502_becomes_200": 4,
  "http_status_code/503_becomes_200": 9,
  "http_status_code/523_becomes_522": 1,

  // This is a case where the old code sets the status code after reading the body
  // while the new code distinguish between round trip and reading body. Maybe I
  // can fix this case. Maybe re-run? This is clearly the case in which there is
  // some error after the round trip...
  "http_status_code/error_reading_body": 1,

  // This tells me I need to re-run for a bunch of URLs
  "http_status_code/failure/new": 935,
  "http_status_code/failure/old": 5474,

  // These are the cases in which we get the same status code
  "http_status_code/match": 19699,
  "http_status_code/total": 26465,

All these cases are interesting and are, additionally, a source of confusion anyway for new probes. That is, looking into what is happening here was out of curiosity and for the sake of data quality mostly.

See ooni/probe#1707 (comment)

See ooni/probe#1707

See ooni/probe#1707 (comment)

bassosimone · 2021-10-18T08:04:47Z

HTTP status code (2/2)

One of the remaining cases (http://www.irna.ir/en/) is quite fascinating:

Well, actually, no, it isn't fascinating. It's just a data quality issue with the way in which webconnectivity works. I've explained why this is a systematic issue of webconnectivity in #1824.

bassosimone · 2021-10-18T10:32:48Z

Review of the results and conclusions

The objective of this comparison was to figure out whether replacing the old webconnectivity test helper (TH) with the new one would not reduce the quality of the data produced by new OONI Probes (i.e., OONI Probes using the Go engine).

The rest of this report is organised as follows. We analyse the Go client to determine what matters. We explain how run the comparison. We document actions to reduce the diff in the JSONs produced by the two THS in varying conditions. We document some oddities we observed. We show the results and comment each aspect. We draw the conclusions.

Analysis of the Go client

In this section, we read through the source code of the Go client. The objective here is to figure out what specific fields emitted by the TH matter the most to determine the blocking status of a measurement.

Because we've already discontinued the Python implementation of OONI Probe, we only focus on the Go codebase.

As of v3.10.1, the webconnectivity implementation:

uses the string result from the control in dnsanalysis.go to determine DNS consistency
performs HTTPAnalysis in httpanalysis.go
aggregates a summary in summary.go
only directly compares with the DNS flag inside the control in dnsanalysis.go and otherwise does not directly string match with the strings returned by the control
incorrectly assumes that a status code equals to zero means failure whereas the original TH returns -1 (see webconnectivity: Go implementation wrongly handles the TH's HTTP status code #1825)
skips the body length check if the response body is truncated
skips the status code check if the response is 500
skips many "common" headers when comparing headers
uses a 128 byte limit for extracting the <title>
skips the <title> check if the body is truncated
still, does not set accessible and blocking if the control HTTP request failed, which somehow protects from the above mentioned issues regarding the truncated body and the incorrect handling of the status code
does not seem to investigate whether the a connect failure could have happened also in the control but this issue is mitigated again by the fact that it does not set accessible and blocking if the control request fails, still this means there is a bunch of cases in which we could say there's consistency with respect to the control and we don't
when there is no DNS, TCP/IP or TLS blocking, falls back to complicated heuristics that maybe could be made simpler by not processing ex post the full results but rather by using a model where we determine blocking right after each measurement step (this is also a good design hint for websteps, if we're able to do that)

Because of the above facts, it seems it's really important for us to make sure we don't have DNS discrepancies between what the old TH and the new TH do DNS-wise. Other aspects are comparatively less important to equalize.

Methodology

I've run repeatedly the old and the new TH with a nearly full version of the test list. To this end, I used the code at https://github.com/ooni/wcthcheck. Many commits in such a repository mentioned this issue. Also, there are comments in this issue explaining how I run the whole experiment. In a nutshell: I run the two THs on a Linux box, side by side and accessed them using oohelper, the TH client helper included in github.com/ooni/probe-cli.

I've also tried to use Jafar to provoke specific errors and see how the two test helpers reacted to them: #1707 (comment).

Actions to reduce the diff between the new and the old TH

The experiments I've run led to the following PRs that attempted to reduce the diff between the old and the new TH

Oddities

Performing this analysis, I came across these interesting oddities that may have a data quality impact:

Results

Here we classify by "phase" of the experiment: DNS, TCP/IP, and HTTP.

category	mismatches	total	percentage
dns	49	27873	0.17%
http_body_length	10961	26405	41%
http_failure	6608	26405	25%
http_headers	...	...	...
http_status	6766	26405	25%
http_title	13862	26405	52%
tcp_connect	...	...	...

The total number of URLs was 27971. We're missing 16 URLs with both, 1 is only missing with the new TH, and 81 are only missing with the old TH. Because the result encompass DNS, TCP/IP, and HTTP, re-running a measurement could in some cases cause failures in one aspect (e.g., DNS) while fixing other aspects in others (e.g., HTTP): (as we already know) the test lists contain many not-so-reliable websites.

DNS

So, regarding the DNS we are in a very good position and we only have really few differences. The full classification of the 49 DNS mismatches between new and old TH is the following:

in one case the comparison failed because we could not figure out the IP's ASN, making comparison unfeasible. It turns out it's an issue of the GeoIP database we're using. The IPs in this result actually belong to Amazon's CloudFront.
in 19 cases the new TH failed and in 9 cases the old TH failed, making comparison unfeasible. It's difficult to say whether the new TH is more likely to have DNS failures. However, wcth: new wcth cannot resolve www.stratcom.mil, old wcth can #1823 suggests that we should probably increase the timeout of the new codebase a bit to give the upstream server we're using time to reply with SERVFAIL.
in 9 cases the failure is different between the old and the new TH and in 11 cases the failure is flagged as other. I have not fully investigated these cases yet and some of them could just be limitations of the classification where you end up flagging something as other having excluded all the other known possibilities.

It's also worth noting that we have 17 cases in which the old TH could not handle the URL containing an IP address. They are not flagged as failures, rather this is a known bug, so it goes into a specific, distinct bucket.

HTTP body length

The body length is complicated because we need to see the body but the THs generally do not return it. We have seen that the old TH sometimes encodes HTML tags, which may be a data quality issue because the new probe doesn't do it. See #1707 (comment) for more details on how I learned about this encoding issue.

HTTP failure

5574 of the 6608 failure mismatches occur because the new TH failure is null and the old TH failure is not null. The converse happens in 935 cases out of 6608. This seems to suggest that the new TH is more reliable. (Empirically, I also observed that, while the measurements were progressing, there were more failures with the old TH, since the number of missing measurements with the old TH was generally larger than the ones with the new TH.)

(Ah, and I also showed in #1707 (comment) that the old TH does not validate TLS certificates, which is a potential data quality issue and source of differences: the Go probe will fail in this case.)

HTTP headers

I basically gave up comparing headers. We need to exclude the common set of headers that the probes ignore. The remaining headers are quite dependent on the two implementations being very different (the old TH uses Twisted 13.2.0, which is software released in November 2013). I tried to add more headers to the ignore list, which often led me to an empty intersection between headers. I took this fact as a signal that investigating headers differences was a waste of my time. That said, the Go probe uses the same codebase of the new TH, so using the new TH seems better here.

HTTP status code

The status code matched in 19699 cases out of 26465. This leaves us with 6766 mismatches. Of these, 6409 are caused by the old TH failing for HTTP (5474 cases) or the new TH failing (935 cases). The remaining cases are cases in which the status code changes by changing TH. I also showed a case, http://www.irna.ir/en/, in which the result is purely dependent on the IP address chosen by the specific TH to fetch the webpage: not following all endpoints has a data quality cost (#1825).

HTTP title

When discussing the title, it's important to remember that the Go probe and the new TH use a similar codebase, therefore they should have more consistent results in this respect. I learned that the old TH strips non-latin characters from the title itself, which leads to it producing a smaller (6451 cases) or empty title (6309 cases). Summing these two cases we obtain 12760, which is most of the 13862 mismatching cases. There are also few cases where the new TH title is smaller or empty, which likely depend on the regexp used to extract the title (which is limited to a maximum number of characters in the Go codebase) or on the page size. Either way, doing something similar to what the probe does will be better.

TCP connect

I didn't spend much time looking into this topic. We don't directly string match strings from the TH in the probe and I had already gained confidence that we can handle the common cases using Jafar.

Conclusions

I think it's fine to switch to the new TH. We will have to fix some small bugs, chiefly that our DNS timeout seems to small in one specific case and we cannot properly get a SERVFAIL (#1823). We have spent lots of time trying to reduce the difference in the JSON structure and that DNS looks similar. We should fix new Go clients to always perform the body length comparison (now both bodies should be truncated) as documented in #1827. We should fix the new Go clients to properly handle the status code value as documented in #1825. We should have issues for each possible data quality issue anyway and, on top of this, maybe we need a periodic process for monitoring the TH quality (#1829).

Part of ooni/probe#1707 Backport from master; see ooni/probe#1839

Allows us to get http://www.isa.gov.il/Pages/default.aspx's one. Discovered when working on ooni/probe#1707. Backport from master; see ooni/probe#1839

Reducing the errors is not done in a perfect way. We have documented the most striking differences inside ooni/probe#1707 (comment) and some attempts to improve the situation further inside ooni/probe#1707 (comment). A better strategy for the future would be to introduce more specific timeout errors, such as dns_timeout_error, etc. More testing may be needed to further validate and compare the old and the new TH, but this requires Jafar improvements to more precisely simulate more complex censorship. Backport from master; see ooni/probe#1839

Matches the behavior that the legacy TH implements in this situation and reduces slightly the differences. See ooni/probe#1707 (comment) Backport from master; see ooni/probe#1839

See ooni/probe#1707 (comment) Backport from master; see ooni/probe#1839

bassosimone · 2021-10-22T15:33:04Z

We concluded with @hellais that it's okay to replace the old TH with the new TH. The most immediate next step was to create a release of oohelperd that was suitable for doing that, which I did in #1841.

It remains to figure out what are the remaining issues (including addressing issues discovered as part of this one). But, we can also safely close this issue.

When a probe gets a local DNS failure, it will continue and nonetheless query the test helper without any IP address, just an empty list. This diff fixes the behavior of cmd/oohelper to do the same. Work part of ooni/probe#1707.

Part of ooni/probe#1707

Allows us to get http://www.isa.gov.il/Pages/default.aspx's one. Discovered when working on ooni/probe#1707.

Reducing the errors is not done in a perfect way. We have documented the most striking differences inside ooni/probe#1707 (comment) and some attempts to improve the situation further inside ooni/probe#1707 (comment). A better strategy for the future would be to introduce more specific timeout errors, such as dns_timeout_error, etc. More testing may be needed to further validate and compare the old and the new TH, but this requires Jafar improvements to more precisely simulate more complex censorship.

Matches the behavior that the legacy TH implements in this situation and reduces slightly the differences. See ooni/probe#1707 (comment)

See ooni/probe#1707 (comment)

bassosimone added priority/high effort/L interrupt data quality ooni/probe-engine labels Jul 6, 2021

bassosimone added this to the Sprint 43 - Confused Kraken milestone Jul 6, 2021

bassosimone self-assigned this Jul 6, 2021

bassosimone changed the title ~~wcth: compare old and new test helpers~~ wcth: compare old and new test helpers (1/n) Jul 6, 2021

hellais modified the milestones: Sprint 43 - Confused Kraken, Sprint 44 - Dancing Sirens Jul 20, 2021

bassosimone modified the milestones: Sprint 44 - Dancing Sirens, Sprint 45 - Antarctic Krill Jul 30, 2021

bassosimone modified the milestones: Sprint 45 - Antarctic Krill, Sprint 46 - Happy Oyster Aug 16, 2021

bassosimone modified the milestones: Sprint 46 - Happy Oyster, Sprint 47 - Caribbean Reef Octopus Aug 30, 2021

bassosimone mentioned this issue Sep 4, 2021

fix(oohelper): do not stop if DNS fails locally ooni/probe-cli#450

Merged

3 tasks

bassosimone mentioned this issue Sep 9, 2021

DNS resolution poisoning: the case of 97dounai.top #1780

Open

bassosimone added a commit to ooni/backend that referenced this issue Sep 13, 2021

fix(requirements.txt): pin precise dependencies

80d5645

See ooni/probe#1707

bassosimone mentioned this issue Sep 13, 2021

fix(requirements.txt): pin precise dependencies ooni/backend#533

Merged

bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 13, 2021

fix(oohelperd): reduce differences with legacy helper

251beaa

Part of ooni/probe#1707

bassosimone mentioned this issue Sep 13, 2021

fix(oohelperd): reduce differences with legacy helper ooni/probe-cli#504

Merged

3 tasks

bassosimone modified the milestones: Sprint 47 - Caribbean Reef Octopus, Sprint 48 - Amazon river dolphin Sep 13, 2021

bassosimone added a commit to ooni/backend that referenced this issue Sep 13, 2021

fix(requirements.txt): pin precise dependencies (#533)

6ec4fda

See ooni/probe#1707

bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 27, 2021

fix(oohelperd): reduce differences with legacy helper (#504)

985c1ba

Part of ooni/probe#1707

bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 15, 2021

fix(wcth): match legacy TH w/ empty DNS reply (#546)

2fa87f0

See ooni/probe#1707 (comment)

bassosimone mentioned this issue Oct 15, 2021

wcth: new wcth cannot resolve www.stratcom.mil, old wcth can #1823

Closed

bassosimone added a commit to ooni/wcthcheck that referenced this issue Oct 15, 2021

fix(compare): better code for headers comparison

76cb739

See ooni/probe#1707 (comment)

bassosimone added a commit to ooni/wcthcheck that referenced this issue Oct 15, 2021

fix(compare): account for failures when checking at status

63c7882

See ooni/probe#1707

bassosimone added a commit to ooni/wcthcheck that referenced this issue Oct 15, 2021

feat(compare): find 403->200 and 403->200+cloudflare

02b0947

See ooni/probe#1707 (comment)

bassosimone added a commit to ooni/wcthcheck that referenced this issue Oct 15, 2021

compare: handle 403=>503 and special-case for cloudflare

0ab86e5

See ooni/probe#1707 (comment)

bassosimone added a commit to ooni/wcthcheck that referenced this issue Oct 15, 2021

compare: finish improving code to check status codes

8e99007

See ooni/probe#1707 (comment)

bassosimone mentioned this issue Oct 18, 2021

webconnectivity: inaccurate www.irna.ir measurement #1824

Open

bassosimone mentioned this issue Oct 18, 2021

testlists: semi-automatically detect and remove domains that are now for sale (aka parked domains) #1826

Open

bassosimone mentioned this issue Oct 18, 2021

old wcth: does not verify TLS cerificates #1828

Closed

bassosimone changed the title ~~wcth: compare old and new test helpers (1/n)~~ wcth: compare old and new test helpers Oct 18, 2021

bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021

fix(oohelperd): return HTTP headers as empty map on error (#541)

e0642ea

Part of ooni/probe#1707 Backport from master; see ooni/probe#1839

bassosimone added a commit to ooni/probe-cli that referenced this issue Oct 20, 2021

fix(wcth): match legacy TH w/ empty DNS reply (#546)

e62ec5d

See ooni/probe#1707 (comment) Backport from master; see ooni/probe#1839

bassosimone closed this as completed Oct 22, 2021

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022

fix(oohelperd): reduce differences with legacy helper (ooni#504)

9873a72

Part of ooni/probe#1707

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022

fix(oohelperd): return HTTP headers as empty map on error (ooni#541)

60f02e1

Part of ooni/probe#1707

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022

fix(webconnectivity): gather longer HTML titles (ooni#542)

0a1e9de

Allows us to get http://www.isa.gov.il/Pages/default.aspx's one. Discovered when working on ooni/probe#1707.

ainghazal pushed a commit to ainghazal/probe-cli that referenced this issue Mar 8, 2022

fix(wcth): match legacy TH w/ empty DNS reply (ooni#546)

91cc373

See ooni/probe#1707 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wcth: compare old and new test helpers #1707

wcth: compare old and new test helpers #1707

bassosimone commented Jul 6, 2021 •

edited

Loading

bassosimone commented Sep 13, 2021 •

edited

Loading

bassosimone commented Sep 13, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 18, 2021 •

edited

Loading

bassosimone commented Oct 18, 2021 •

edited

Loading

bassosimone commented Oct 22, 2021

wcth: compare old and new test helpers #1707

wcth: compare old and new test helpers #1707

Comments

bassosimone commented Jul 6, 2021 • edited Loading

bassosimone commented Sep 13, 2021 • edited Loading

bassosimone commented Sep 13, 2021 • edited Loading

bassosimone commented Oct 15, 2021 • edited Loading

Further investigating DNS differences (5/5)

bassosimone commented Oct 15, 2021 • edited Loading

HTTP headers

bassosimone commented Oct 15, 2021 • edited Loading

HTTP status code (1/2)

bassosimone commented Oct 18, 2021 • edited Loading

HTTP status code (2/2)

bassosimone commented Oct 18, 2021 • edited Loading

Review of the results and conclusions

Analysis of the Go client

Methodology

Actions to reduce the diff between the new and the old TH

Oddities

Results

DNS

HTTP body length

HTTP failure

HTTP headers

HTTP status code

HTTP title

TCP connect

Conclusions

bassosimone commented Oct 22, 2021

bassosimone commented Jul 6, 2021 •

edited

Loading

bassosimone commented Sep 13, 2021 •

edited

Loading

bassosimone commented Sep 13, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 15, 2021 •

edited

Loading

bassosimone commented Oct 18, 2021 •

edited

Loading

bassosimone commented Oct 18, 2021 •

edited

Loading