Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: each experiment calls the input-fetching API it needs? #2381

Closed
bassosimone opened this issue Dec 7, 2022 · 1 comment
Closed

engine: each experiment calls the input-fetching API it needs? #2381

bassosimone opened this issue Dec 7, 2022 · 1 comment
Assignees
Labels
funder/drl2022-2024 methodology issues related to the testing methodology needs investigation This issue needs extra data and investigation ooni/probe-engine priority/high refactoring research prototype

Comments

@bassosimone
Copy link
Contributor

This (currently-work-in-progress) issue describes an alternative solution for passing richer input to experiments that perhaps is conducive to less complexity inside the core of ooniprobe and more flexibility in terms of data formats.

Let's kick off our discussion by observing that we have N possible input formats already (Web Connectivity, Psiphon, DNSCheck, Tor) and M formats (Web Connectivity is actually also used by urlgetter and would be used by websteps, Psiphon is only used by Psiphon, DNSCheck only by DNSCheck, and Tor only by Tor). Additionally in a run-by-command-line or OONI Run v2 scenario, some orthogonal input is provided by either command line settings or the OONI Run v2 descriptor.

This situation has led us to (1) converge on the minimum denominator for passing inputs to experiments (i.e., strings containing URLs) and (2) using additional mechanisms for providing inputs to experiments where the input would not fit this model (think, e.g., at how Psiphon and Tor download their own input and have no strings-based input).

What this situation is telling us, though, is that we actually have one single kind of experiment, the one which fetches its own input, formatted according to the input type understood by the experiment, and processes it accordingly. Obviously, even if we did that, we would have other bottleneck places where experiments assume string input (e.g., the database format). Yet, if this would be possible, we could reduce the ~complex way in which experiments run with or without input to another model where each experiment does the right thing for itself (which, in cases such as Telegram, is to actually not have input).

Now, this discussion makes sense conceptually but changing the code to behave as described may or may not be quite difficult to do. I am not super sure. Hence this issue. We want to explore the design space and work on small prototypes to understand whether this (in my opinion desirable) design change is doable or too hard given the current codebase.

@bassosimone bassosimone self-assigned this Dec 7, 2022
@bassosimone bassosimone added funder/drl2022-2024 methodology issues related to the testing methodology needs investigation This issue needs extra data and investigation ooni/probe-engine priority/high refactoring research prototype labels Feb 1, 2023
bassosimone added a commit to ooni/2023-05-richer-input that referenced this issue Jun 12, 2023
The ooni/probe#2381 issue advocates for each
experiment _fetching_ its own targets through the correct API.

This repository assumes that we have all the targets in a single unified
structure provided by the check-in v2 API.

However, the code in this repository assumes that targets are opaque
and each experiment could handle different targets.

Consider this sentence from the above-mentioned issue:

> we actually have one single kind of experiment, the one which fetches
> its own input, formatted according to the input type understood by the
> experiment, and processes it accordingly.

It is the most important sentence of the issue. The original focus
was on "fetching its own input". However, rereading the sentence after
a few months, it seems the important concept was actually allowing
each experiment to _handle_ its own targets (we called targets "inputs"
in the issue, because we have not had a chance to spell out the
difference between inputs and targets).

This is the reason why I think the work done in this repository
has helped to explore the design space described in the issue above.
@bassosimone
Copy link
Contributor Author

We invested significant effort to write a prototype of this issue in ooni/probe-cli#1005. While the prototype itself could still be useful, because it contains bits of code that it's worth merging into master, the issue in itself can now be considered complete. The ooni/2023-05-richer-input@9a0e0ed commit explains the reason why I think that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
funder/drl2022-2024 methodology issues related to the testing methodology needs investigation This issue needs extra data and investigation ooni/probe-engine priority/high refactoring research prototype
Projects
None yet
Development

No branches or pull requests

1 participant