Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(webconnectivitylte): introduce classic analysis #1420

Merged
merged 114 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
766d4d0
chore: start investigating LTE vs v0.4
bassosimone Nov 22, 2023
f3701a0
document why some QA tests with redirects are broken
bassosimone Nov 23, 2023
f1cc4bb
document more doubts about emmitting events
bassosimone Nov 23, 2023
0b2203d
document more caveats
bassosimone Nov 23, 2023
8eb1ba1
[ci skip] remember to update files in sync
bassosimone Nov 23, 2023
92eb7b7
doc: document more doubts that I have
bassosimone Nov 23, 2023
ea0d3bf
[ci skip] more documentation on what to do
bassosimone Nov 23, 2023
fd06406
feat: progress towards fixing some fundamental issues
bassosimone Nov 23, 2023
ad75714
resolve one more test case
bassosimone Nov 23, 2023
d6cbfd9
more fixes
bassosimone Nov 23, 2023
41fbd3f
doc: explain issues caused by adding HTTP response
bassosimone Nov 23, 2023
e5e4c37
try to sketch out an ooni/data-inspired pipeline
bassosimone Nov 23, 2023
6aff4f0
convert more of v0.5's analysis to the ooni/data-like style
bassosimone Nov 24, 2023
94f9fd7
some more progress
bassosimone Nov 24, 2023
ff42f3c
break the code in a different way
bassosimone Nov 24, 2023
5ad88d5
feat: rewrite the pipeline to match ooni/data more closely
bassosimone Nov 24, 2023
132ba4d
also implement the analysis
bassosimone Nov 24, 2023
be18947
work
bassosimone Nov 24, 2023
18da855
we're mostly done in terms of passing the existing QA tests
bassosimone Nov 25, 2023
e7764c8
tests now green
bassosimone Nov 25, 2023
dac4170
make more test cases work with LTE
bassosimone Nov 26, 2023
258d7fb
we now pass all tests
bassosimone Nov 27, 2023
c6a49ff
[ci skip] remove TODO
bassosimone Nov 27, 2023
8d25a65
fix tricky case with order of DNS processing
bassosimone Nov 27, 2023
44541ea
adjust test case where actually dns is consistent with lte
bassosimone Nov 27, 2023
183f524
make all lte tests pass consistently
bassosimone Nov 27, 2023
a4bedcc
x
bassosimone Nov 27, 2023
3ded283
start generating test cases for the minipipeline
bassosimone Nov 27, 2023
1eaaac0
start adding tests for the minipipeline
bassosimone Nov 27, 2023
df33632
add tests for the minipipeline command
bassosimone Nov 27, 2023
5ad8387
more testing
bassosimone Nov 27, 2023
9fc77fc
more minipipeline tests
bassosimone Nov 27, 2023
c7c310a
x
bassosimone Nov 27, 2023
281e38d
x
bassosimone Nov 27, 2023
0ea4803
add more test cases
bassosimone Nov 27, 2023
9ec20fc
x
bassosimone Nov 27, 2023
66364ed
x
bassosimone Nov 27, 2023
329b2c8
x
bassosimone Nov 27, 2023
72b2be9
x
bassosimone Nov 27, 2023
7f8c143
start documenting code and existing bugs
bassosimone Nov 27, 2023
14840ac
attempt to fix the model problems
bassosimone Nov 27, 2023
55c05ac
commit the measurements
bassosimone Nov 27, 2023
c45a2f6
okay, this looks relatively good
bassosimone Nov 27, 2023
28f93aa
other changes
bassosimone Nov 27, 2023
d849894
x
bassosimone Nov 27, 2023
5091608
add measurements
bassosimone Nov 27, 2023
f1d7137
x
bassosimone Nov 27, 2023
c228c4e
add measurements
bassosimone Nov 27, 2023
8d668fe
x
bassosimone Nov 27, 2023
29bfdd4
x
bassosimone Nov 27, 2023
310ab28
x
bassosimone Nov 27, 2023
dfdf673
meas
bassosimone Nov 27, 2023
9110384
meas
bassosimone Nov 27, 2023
1a8c235
obs
bassosimone Nov 27, 2023
29fec45
x
bassosimone Nov 27, 2023
05f8838
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
b6c643f
x
bassosimone Nov 28, 2023
cc60691
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0956494
fix potential bug with failed DNS lookups
bassosimone Nov 28, 2023
fae9155
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
0c6d012
x
bassosimone Nov 28, 2023
0b3a979
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
7a6f00b
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
1040d73
simplify
bassosimone Nov 28, 2023
cfe7643
x
bassosimone Nov 28, 2023
9138fb6
x
bassosimone Nov 28, 2023
a6cf23b
[ci skip] Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
5b7aa00
Butcher lte and make sure tests are aligned with v0.4
bassosimone Nov 28, 2023
4331276
we need to trust everything that v0.4 emits
bassosimone Nov 28, 2023
383eb69
x
bassosimone Nov 28, 2023
1488d05
Merge branch 'master' into issue/2634
bassosimone Nov 28, 2023
4217859
x
bassosimone Nov 28, 2023
99d5e1e
add classic filter
bassosimone Nov 28, 2023
6eeb352
[ci skip]
bassosimone Nov 28, 2023
7d5bed9
x
bassosimone Nov 28, 2023
e1f87b1
x
bassosimone Nov 28, 2023
f7f5c3e
x
bassosimone Nov 28, 2023
90457e9
Merge branch 'master' into issue/2634
bassosimone Nov 29, 2023
b83ec13
Merge branch 'master' into issue/2634
bassosimone Nov 29, 2023
cb2a89d
x
bassosimone Nov 29, 2023
dd28dd4
x
bassosimone Nov 29, 2023
fa605cf
x
bassosimone Nov 29, 2023
99fecb3
rewrite analysis
bassosimone Nov 29, 2023
37320b3
x
bassosimone Nov 29, 2023
ec9f57c
[ci skip] Merge branch 'master' into issue/2634
bassosimone Nov 29, 2023
b13693e
x
bassosimone Nov 29, 2023
f25c811
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
b62b6a2
x
bassosimone Nov 30, 2023
c4f8916
x
bassosimone Nov 30, 2023
6711640
x
bassosimone Nov 30, 2023
2498a31
xx
bassosimone Nov 30, 2023
6186f67
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
e312857
x
bassosimone Nov 30, 2023
6f620fc
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
f9e3e29
x
bassosimone Nov 30, 2023
56f1659
x
bassosimone Nov 30, 2023
1b9355e
x
bassosimone Nov 30, 2023
280d5f3
x
bassosimone Nov 30, 2023
eca31f0
x
bassosimone Nov 30, 2023
041186c
x
bassosimone Nov 30, 2023
7eda6aa
x
bassosimone Nov 30, 2023
22cc17c
x
bassosimone Nov 30, 2023
61522d3
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
601e4d3
x
bassosimone Nov 30, 2023
f4c9643
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
8bd14be
Merge branch 'master' into issue/2634
bassosimone Nov 30, 2023
3c92520
x
bassosimone Nov 30, 2023
fd5ea63
x
bassosimone Nov 30, 2023
39723b8
[ci skip]
bassosimone Nov 30, 2023
64b43f0
x
bassosimone Nov 30, 2023
3367af4
x
bassosimone Nov 30, 2023
5b63d6a
x
bassosimone Nov 30, 2023
fa6723b
adapt
bassosimone Nov 30, 2023
78d9218
x
bassosimone Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
339 changes: 339 additions & 0 deletions internal/experiment/webconnectivitylte/analysisclassic.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
package webconnectivitylte

import (
"github.com/ooni/probe-cli/v3/internal/minipipeline"
"github.com/ooni/probe-cli/v3/internal/model"
"github.com/ooni/probe-cli/v3/internal/optional"
"github.com/ooni/probe-cli/v3/internal/runtimex"
)

// AnalysisEngineClassic is an alternative analysis engine that aims to produce
// results that are backward compatible with Web Connectivity v0.4.
func AnalysisEngineClassic(tk *TestKeys, logger model.Logger) {
tk.analysisClassic(logger)
}

func (tk *TestKeys) analysisClassic(logger model.Logger) {
// Since we run after all tasks have completed (or so we assume) we're
// not going to use any form of locking here.

// 1. produce web observations
container := minipipeline.NewWebObservationsContainer()
container.IngestDNSLookupEvents(tk.Queries...)
container.IngestTCPConnectEvents(tk.TCPConnect...)
container.IngestTLSHandshakeEvents(tk.TLSHandshakes...)
container.IngestHTTPRoundTripEvents(tk.Requests...)

// be defensive in case the control request or response are not defined
if tk.ControlRequest != nil && tk.Control != nil {
// Implementation note: the only error that can happen here is when the input
// doesn't parse as a URL, which should have caused measurer.go to fail
runtimex.Try0(container.IngestControlMessages(tk.ControlRequest, tk.Control))
}

// 2. filter observations to only include results collected by the
// system resolver, which approximates v0.4's results
classic := minipipeline.ClassicFilter(container)

// 3. produce a web observations analysis based on the web observations
woa := minipipeline.AnalyzeWebObservations(classic)

// 4. determine the DNS consistency
tk.DNSConsistency = analysisClassicDNSConsistency(woa)

// 5. compute the HTTPDiff values
tk.setHTTPDiffValues(woa)

// 6. compute blocking & accessible
analysisClassicComputeBlockingAccessible(woa, tk)
}

func analysisClassicDNSConsistency(woa *minipipeline.WebAnalysis) optional.Value[string] {
switch {
case woa.DNSLookupUnexpectedFailure.Len() <= 0 && // no unexpected failures; and
woa.DNSLookupSuccessWithInvalidAddressesClassic.Len() <= 0 && // no invalid addresses; and
(woa.DNSLookupSuccessWithValidAddressClassic.Len() > 0 || // good addrs; or
woa.DNSLookupExpectedFailure.Len() > 0): // expected failures
return optional.Some("consistent")

case woa.DNSLookupSuccessWithInvalidAddressesClassic.Len() > 0 || // unexpected addrs; or
woa.DNSLookupUnexpectedFailure.Len() > 0: // unexpected failures
return optional.Some("inconsistent")

default:
return optional.None[string]() // none of the above
}
}

func (tk *TestKeys) setHTTPDiffValues(woa *minipipeline.WebAnalysis) {
const bodyProportionFactor = 0.7
if !woa.HTTPFinalResponseDiffBodyProportionFactor.IsNone() {
value := woa.HTTPFinalResponseDiffBodyProportionFactor.Unwrap() > bodyProportionFactor
tk.BodyLengthMatch = &value
}

if !woa.HTTPFinalResponseDiffUncommonHeadersIntersection.IsNone() {
value := len(woa.HTTPFinalResponseDiffUncommonHeadersIntersection.Unwrap()) > 0
tk.HeadersMatch = &value
}

if !woa.HTTPFinalResponseDiffStatusCodeMatch.IsNone() {
value := woa.HTTPFinalResponseDiffStatusCodeMatch.Unwrap()
tk.StatusCodeMatch = &value
}

if !woa.HTTPFinalResponseDiffTitleDifferentLongWords.IsNone() {
value := len(woa.HTTPFinalResponseDiffTitleDifferentLongWords.Unwrap()) <= 0
tk.TitleMatch = &value
}
}

type analysisClassicTestKeysProxy interface {
// setBlockingString sets blocking to a string.
setBlockingString(value string)

// setBlockingNil sets blocking to nil.
setBlockingNil()

// setBlockingFalse sets Blocking to false.
setBlockingFalse()

// httpDiff returns true if there's an http-diff.
httpDiff() bool
}

var _ analysisClassicTestKeysProxy = &TestKeys{}

// httpDiff implements analysisClassicTestKeysProxy.
func (tk *TestKeys) httpDiff() bool {
if tk.StatusCodeMatch != nil && *tk.StatusCodeMatch {
if tk.BodyLengthMatch != nil && *tk.BodyLengthMatch {
return false
}
if tk.HeadersMatch != nil && *tk.HeadersMatch {
return false
}
if tk.TitleMatch != nil && *tk.TitleMatch {
return false
}
// fallthrough
}
return true
}

// setBlockingFalse implements analysisClassicTestKeysProxy.
func (tk *TestKeys) setBlockingFalse() {
tk.Blocking = false
tk.Accessible = true
}

// setBlockingNil implements analysisClassicTestKeysProxy.
func (tk *TestKeys) setBlockingNil() {
if !tk.DNSConsistency.IsNone() && tk.DNSConsistency.Unwrap() == "inconsistent" {
tk.Blocking = "dns"
tk.Accessible = false
} else {
tk.Blocking = nil
tk.Accessible = nil
}
}

// setBlockingString implements analysisClassicTestKeysProxy.
func (tk *TestKeys) setBlockingString(value string) {
if !tk.DNSConsistency.IsNone() && tk.DNSConsistency.Unwrap() == "inconsistent" {
tk.Blocking = "dns"
} else {
tk.Blocking = value
}
tk.Accessible = false
}

func analysisClassicComputeBlockingAccessible(woa *minipipeline.WebAnalysis, tk analysisClassicTestKeysProxy) {
// minipipeline.NewLinearWebAnalysis produces a woa.Linear sorted
//
// 1. by descending TagDepth;
//
// 2. with TagDepth being equal, by descending [WebObservationType];
//
// 3. with [WebObservationType] being equal, by ascending failure string;
//
// This means that you divide the list in groups like this:
//
// +------------+------------+------------+------------+
// | TagDepth=3 | TagDepth=2 | TagDepth=1 | TagDepth=0 |
// +------------+------------+------------+------------+
//
// Where TagDepth=3 is the last redirect and TagDepth=0 is the initial request.
//
// Each group is further divided as follows:
//
// +------+-----+-----+-----+
// | HTTP | TLS | TCP | DNS |
// +------+-----+-----+-----+
//
// Where each group may be empty. The first non-empty group is about the
// operation that failed for the current TagDepth.
//
// Within each group, successes sort before failures because the empty
// string has priority over non-empty strings.
//
// So, when walking the list from index 0 to index N, you encounter the
// latest redirects first, you observe the more complex operations first,
// and you see errors before failures.
for _, entry := range woa.Linear {

// 1. As a special case, handle a "final" response first. We define "final" a
// successful response whose status code is like 2xx, 4xx, or 5xx.
if !entry.HTTPResponseIsFinal.IsNone() && entry.HTTPResponseIsFinal.Unwrap() {

// 1.1. Handle the case of succesful response over TLS.
if !entry.TLSHandshakeFailure.IsNone() && entry.TLSHandshakeFailure.Unwrap() == "" {
tk.setBlockingFalse()
return
}

// 1.2. Handle the case of missing HTTP control.
if entry.ControlHTTPFailure.IsNone() {
tk.setBlockingNil()
return
}

// 1.3. Figure out whether the measurement and the control are close enough.
if !tk.httpDiff() {
tk.setBlockingFalse()
return
}

// 1.4. There's something different in the two responses.
tk.setBlockingString("http-diff")
return
}

// 2. Let's now focus on failed HTTP round trips.
if entry.Type == minipipeline.WebObservationTypeHTTPRoundTrip &&
!entry.Failure.IsNone() && entry.Failure.Unwrap() != "" {

// 2.1. Handle the case of a missing HTTP control. Maybe
// the control server is unreachable or blocked.
if entry.ControlHTTPFailure.IsNone() {
tk.setBlockingNil()
return
}

// 2.2. Handle the case where both the probe and the control failed.
if entry.ControlHTTPFailure.Unwrap() != "" {
// TODO(bassosimone): returning this result is wrong and we
// should also set Accessible to false. However, v0.4
// does this and we should play along for the A/B testing.
tk.setBlockingFalse()
return
}

// 2.3. Handle the case where just the probe failed.
tk.setBlockingString("http-failure")
return
}

// 3. Handle the case of TLS failure.
if entry.Type == minipipeline.WebObservationTypeTLSHandshake &&
!entry.Failure.IsNone() && entry.Failure.Unwrap() != "" {

// 3.1. Handle the case of missing TLS control information. The control
// only provides information for the first request. Once we start following
// redirects we do not have TLS/TCP/DNS control.
if entry.ControlTLSHandshakeFailure.IsNone() {

// 3.1.1 Handle the case of missing an expectation about what
// accessing the website should lead to, which is set forth by
// the control accessing the website and telling us.
if entry.ControlHTTPFailure.IsNone() {
tk.setBlockingNil()
return
}

// 3.1.2. Otherwise, if the control worked, that's blocking.
tk.setBlockingString("http-failure")
return
}

// 3.2. Handle the case where both probe and control failed.
if entry.ControlTLSHandshakeFailure.Unwrap() != "" {
// TODO(bassosimone): returning this result is wrong and we
// should set Accessible and Blocking to false. However, v0.4
// does this and we should play along for the A/B testing.
tk.setBlockingNil()
return
}

// 3.3. Handle the case where just the probe failed.
tk.setBlockingString("http-failure")
return
}

// 4. Handle the case of TCP failure.
if entry.Type == minipipeline.WebObservationTypeTCPConnect &&
!entry.Failure.IsNone() && entry.Failure.Unwrap() != "" {

// 4.1. Handle the case of missing TCP control info.
if entry.ControlTCPConnectFailure.IsNone() {

// 4.1.1 Handle the case of missing an expectation about what
// accessing the website should lead to.
if entry.ControlHTTPFailure.IsNone() {
tk.setBlockingNil()
return
}

// 4.1.2. Otherwise, if the control worked, that's blocking.
tk.setBlockingString("http-failure")
return
}

// 4.2. Handle the case where both probe and control failed.
if entry.ControlTCPConnectFailure.Unwrap() != "" {
// TODO(bassosimone): returning this result is wrong and we
// should set Accessible and Blocking to false. However, v0.4
// does this and we should play along for the A/B testing.
tk.setBlockingFalse()
return
}

// 4.3. Handle the case where just the probe failed.
tk.setBlockingString("tcp_ip")
return
}

// 5. Handle the case of DNS failure
if entry.Type == minipipeline.WebObservationTypeDNSLookup &&
!entry.Failure.IsNone() && entry.Failure.Unwrap() != "" {

// 5.1. Handle the case of missing DNS control info.
if entry.ControlDNSLookupFailure.IsNone() {

// 5.1.1 Handle the case of missing an expectation about what
// accessing the website should lead to.
if entry.ControlHTTPFailure.IsNone() {
tk.setBlockingFalse()
return
}

// 5.1.2. Otherwise, if the control worked, that's blocking.
tk.setBlockingString("dns")
return
}

// 5.2. Handle the case where both probe and control failed.
if entry.ControlDNSLookupFailure.Unwrap() != "" {
// TODO(bassosimone): returning this result is wrong and we
// should set Accessible and Blocking to false. However, v0.4
// does this and we should play along for the A/B testing.
tk.setBlockingFalse()
return
}

// 5.3. Handle the case where just the probe failed.
tk.setBlockingString("dns")
return
}
}
}
15 changes: 15 additions & 0 deletions internal/experiment/webconnectivitylte/analysiscore.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ const (
analysisFlagSuccess
)

// AnalysisEngineFn is the function that runs the analysis engine for
// processing and scoring measurements collected by LTE.
var AnalysisEngineFn func(tk *TestKeys, logger model.Logger) = AnalysisEngineOrig

// analysisToplevel is the toplevel function that analyses the results
// of the experiment once all network tasks have completed.
//
Expand Down Expand Up @@ -95,6 +99,17 @@ const (
// As an improvement over Web Connectivity v0.4, we also attempt to identify
// special subcases of a null, null result to provide the user with more information.
func (tk *TestKeys) analysisToplevel(logger model.Logger) {
AnalysisEngineFn(tk, logger)
}

// AnalysisEngineOrig is the original analysis engine we wrote for LTE. This engine
// aims to detect and report about all the possible ways in which the measured website
// is blocked. As of 2023-11-30, we still consider this engine experimental.
func AnalysisEngineOrig(tk *TestKeys, logger model.Logger) {
tk.analysisOrig(logger)
}

func (tk *TestKeys) analysisOrig(logger model.Logger) {
// Since we run after all tasks have completed (or so we assume) we're
// not going to use any form of locking here.

Expand Down
5 changes: 3 additions & 2 deletions internal/experiment/webconnectivitylte/analysisdns.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (

"github.com/ooni/probe-cli/v3/internal/model"
"github.com/ooni/probe-cli/v3/internal/netxlite"
"github.com/ooni/probe-cli/v3/internal/optional"
)

const (
Expand Down Expand Up @@ -62,11 +63,11 @@ func (tk *TestKeys) analysisDNSToplevel(logger model.Logger, lookupper model.Geo
tk.analysisDNSUnexpectedAddrs(logger, lookupper)
if tk.DNSFlags != 0 {
logger.Warn("DNSConsistency: inconsistent")
tk.DNSConsistency = "inconsistent"
tk.DNSConsistency = optional.Some("inconsistent")
tk.BlockingFlags |= analysisFlagDNSBlocking
} else {
logger.Info("DNSConsistency: consistent")
tk.DNSConsistency = "consistent"
tk.DNSConsistency = optional.Some("consistent")
}
}

Expand Down
Loading