feat(engine): allow runner to return many measurements #527

bassosimone · 2021-09-29T22:35:47Z

Checklist

I have read the contribution guidelines
reference issue for this pull request: webconnectivity: write prototype of new experiment probe#1733
related ooni/spec pull request: N/A

Location of the issue tracker: https://github.com/ooni/probe

Description

This is required to implement websteps, which is currently tracked
by ooni/probe#1733.

We introduce the concept of async runner. An async runner will
post measurements on a channel until it is done. When it is done,
it will close the channel to notify the reader about that.

This change causes sync experiments now to strictly return either
a non-nil measurement or a non-nil error.

While this is a pretty much obvious situation in golang, we had
some parts of the codebase that were not robust to this assumption
and attempted to submit a measurement after the measure call
returned an error.

Luckily, we had enough tests to catch this change in our assumption
and this is why there are extra docs and tests changes.

This is required to implement websteps, which is currently tracked by ooni/probe#1733. We introduce the concept of async runner. An async runner will post measurements on a channel until it is done. When it is done, it will close the channel to notify the reader about that. This change causes sync experiments now to strictly return either a non-nil measurement or a non-nil error. While this is a pretty much obvious situation in golang, we had some parts of the codebase that were not robust to this assumption and attempted to submit a measurement after the measure call returned an error. Luckily, we had enough tests to catch this change in our assumption and this is why there are extra docs and tests changes.

bassosimone · 2021-09-29T22:39:42Z

internal/engine/experiment_integration_test.go

-	if measurement == nil {
-		t.Fatal("expected non nil measurement here")
+	if measurement != nil {
+		t.Fatal("expected nil measurement here")


This is the first place where our assumptions were too lax wrt the measurement XOR error return value.

bassosimone · 2021-09-29T22:41:13Z

pkg/oonimkall/internal/tasks/runner.go

+			// implemented async measurements, the case where there is an error
+			// and we also have a valid measurement cant't happen anymore. So,
+			// now the only valid strategy here is to continue.
+			continue


This is the second place where our assumptions were too lax wrt the measurement XOR error return value.

bassosimone · 2021-09-29T22:52:46Z

Here's the failure that occurred above:

goroutine 70 [select]:
github.com/ooni/probe-cli/v3/internal/ptx.(*SnowflakeDialer).dialContext(0xfa99c0, {0xca6a10, 0xc0000bc008})
	/home/runner/work/probe-cli/probe-cli/internal/ptx/snowflake.go:73 +0x3fd
github.com/ooni/probe-cli/v3/internal/ptx.(*SnowflakeDialer).DialContext(0x39, {0xca6a10, 0xc0000bc008})
	/home/runner/work/probe-cli/probe-cli/internal/ptx/snowflake.go:45 +0x59
github.com/ooni/probe-cli/v3/internal/ptx.TestSnowflakeDialerWorksWithMocks(0xc0003b56c0)
	/home/runner/work/probe-cli/probe-cli/internal/ptx/snowflake_test.go:62 +0xa7
testing.tRunner(0xc0003b56c0, 0xbfde00)
	/opt/hostedtoolcache/go/1.17.1/x64/src/testing/testing.go:1259 +0x230
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.17.1/x64/src/testing/testing.go:1306 +0x727
FAIL	github.com/ooni/probe-cli/v3/internal/ptx	600.114s

And here's the link to the build: https://github.com/ooni/probe-cli/pull/527/checks?check_run_id=3750039913

I am going to merge anyway, since this failure really seems unrelated (it's testing snowflake in there).

I am also going to create an issue for this.

Since #527, if an experiment returns an error, the corresponding measurement is not submitted since the semantics of returning an error is that something fundamental went wrong (e.g., we could not parse the input URL). This diff ensures that all experiments only return and error when something fundamental was wrong and return nil otherwise. Reference issue: ooni/probe#1808.

This is required to implement websteps, which is currently tracked by ooni/probe#1733. We introduce the concept of async runner. An async runner will post measurements on a channel until it is done. When it is done, it will close the channel to notify the reader about that. This change causes sync experiments now to strictly return either a non-nil measurement or a non-nil error. While this is a pretty much obvious situation in golang, we had some parts of the codebase that were not robust to this assumption and attempted to submit a measurement after the measure call returned an error. Luckily, we had enough tests to catch this change in our assumption and this is why there are extra docs and tests changes.

Since ooni#527, if an experiment returns an error, the corresponding measurement is not submitted since the semantics of returning an error is that something fundamental went wrong (e.g., we could not parse the input URL). This diff ensures that all experiments only return and error when something fundamental was wrong and return nil otherwise. Reference issue: ooni/probe#1808.

This bug is one of these bugs that definitely help one to stay humble and focused on improving the codebase. Of course I <facepalmed> when I understood the root cause. We did not move the annotations below the `if` which is checking whether the measurement was successful when we refactored the codebase to support returning multiple measurements per run, which happened in #527. While I am not going to whip myself too much because of this, it's clearly a bummer that we didn't notice this bug back then. On top of this, it's also quite sad it took us so much time to notice that there was this bug inside the tree. The lesson (hopefully) learned is probably that we need to be more careful when we refactor and we should always ask the question of whether, not only we have tests, but whether these tests could maybe be improved to give us even more confidence about correctness. The reference issue is ooni/probe#2173.

This bug is one of these bugs that definitely help one to stay humble and focused on improving the codebase. Of course I `<facepalmed>` when I understood the root cause. We did not move the annotations below the `if` which is checking whether the measurement was successful when we refactored the codebase to support returning multiple measurements per run, which happened in #527. While I am not going to whip myself too much because of this, it's clearly a bummer that we didn't notice this bug back then. On top of this, it's also quite sad it took us so much time to notice that there was this bug inside the tree. The lesson (hopefully) learned is probably that we need to be more careful when we refactor and we should always ask the question of whether, not only we have tests, but whether these tests could maybe be improved to give us even more confidence about correctness. The reference issue is ooni/probe#2173.

This commit backports 9b08dca from the master branch. Originaly commit message follows: - - - This bug is one of these bugs that definitely help one to stay humble and focused on improving the codebase. Of course I `<facepalmed>` when I understood the root cause. We did not move the annotations below the `if` which is checking whether the measurement was successful when we refactored the codebase to support returning multiple measurements per run, which happened in #527. While I am not going to whip myself too much because of this, it's clearly a bummer that we didn't notice this bug back then. On top of this, it's also quite sad it took us so much time to notice that there was this bug inside the tree. The lesson (hopefully) learned is probably that we need to be more careful when we refactor and we should always ask the question of whether, not only we have tests, but whether these tests could maybe be improved to give us even more confidence about correctness. The reference issue is ooni/probe#2173.

bassosimone requested a review from hellais as a code owner September 29, 2021 22:35

bassosimone commented Sep 29, 2021

View reviewed changes

bassosimone mentioned this pull request Sep 29, 2021

ci: timeout when testing snowflake ooni/probe#1791

Open

bassosimone merged commit ff1c170 into master Sep 29, 2021

bassosimone deleted the issue/1733 branch September 29, 2021 22:54

bassosimone mentioned this pull request Oct 14, 2021

cli: segfault when experiment returns nil, err ooni/probe#1816

Closed

bassosimone mentioned this pull request Jan 7, 2022

fix: ensure experiments return nil when we want to submit #654

Merged

bassosimone mentioned this pull request Jan 14, 2022

nettests: do not return result _and_ error ooni/probe#1808

Closed

bassosimone mentioned this pull request Jun 29, 2022

fix(oonimkall): only set annotations on success #821

Merged

3 tasks

bassosimone mentioned this pull request Jan 13, 2023

webconnectivity: add to measurements which TH is being used ooni/probe#2073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): allow runner to return many measurements #527

feat(engine): allow runner to return many measurements #527

bassosimone commented Sep 29, 2021 •

edited

Loading

bassosimone Sep 29, 2021

bassosimone Sep 29, 2021

bassosimone commented Sep 29, 2021

feat(engine): allow runner to return many measurements #527

feat(engine): allow runner to return many measurements #527

Conversation

bassosimone commented Sep 29, 2021 • edited Loading

Checklist

Description

bassosimone Sep 29, 2021

Choose a reason for hiding this comment

bassosimone Sep 29, 2021

Choose a reason for hiding this comment

bassosimone commented Sep 29, 2021

bassosimone commented Sep 29, 2021 •

edited

Loading