engine: each experiment calls the input-fetching API it needs? #2381

bassosimone · 2022-12-07T08:50:57Z

This (currently-work-in-progress) issue describes an alternative solution for passing richer input to experiments that perhaps is conducive to less complexity inside the core of ooniprobe and more flexibility in terms of data formats.

Let's kick off our discussion by observing that we have N possible input formats already (Web Connectivity, Psiphon, DNSCheck, Tor) and M formats (Web Connectivity is actually also used by urlgetter and would be used by websteps, Psiphon is only used by Psiphon, DNSCheck only by DNSCheck, and Tor only by Tor). Additionally in a run-by-command-line or OONI Run v2 scenario, some orthogonal input is provided by either command line settings or the OONI Run v2 descriptor.

This situation has led us to (1) converge on the minimum denominator for passing inputs to experiments (i.e., strings containing URLs) and (2) using additional mechanisms for providing inputs to experiments where the input would not fit this model (think, e.g., at how Psiphon and Tor download their own input and have no strings-based input).

What this situation is telling us, though, is that we actually have one single kind of experiment, the one which fetches its own input, formatted according to the input type understood by the experiment, and processes it accordingly. Obviously, even if we did that, we would have other bottleneck places where experiments assume string input (e.g., the database format). Yet, if this would be possible, we could reduce the ~complex way in which experiments run with or without input to another model where each experiment does the right thing for itself (which, in cases such as Telegram, is to actually not have input).

Now, this discussion makes sense conceptually but changing the code to behave as described may or may not be quite difficult to do. I am not super sure. Hence this issue. We want to explore the design space and work on small prototypes to understand whether this (in my opinion desirable) design change is doable or too hard given the current codebase.

The ooni/probe#2381 issue advocates for each experiment _fetching_ its own targets through the correct API. This repository assumes that we have all the targets in a single unified structure provided by the check-in v2 API. However, the code in this repository assumes that targets are opaque and each experiment could handle different targets. Consider this sentence from the above-mentioned issue: > we actually have one single kind of experiment, the one which fetches > its own input, formatted according to the input type understood by the > experiment, and processes it accordingly. It is the most important sentence of the issue. The original focus was on "fetching its own input". However, rereading the sentence after a few months, it seems the important concept was actually allowing each experiment to _handle_ its own targets (we called targets "inputs" in the issue, because we have not had a chance to spell out the difference between inputs and targets). This is the reason why I think the work done in this repository has helped to explore the design space described in the issue above.

bassosimone · 2023-06-12T17:59:26Z

We invested significant effort to write a prototype of this issue in ooni/probe-cli#1005. While the prototype itself could still be useful, because it contains bits of code that it's worth merging into master, the issue in itself can now be considered complete. The ooni/2023-05-richer-input@9a0e0ed commit explains the reason why I think that's the case.

bassosimone self-assigned this Dec 7, 2022

bassosimone added funder/drl2022-2024 methodology issues related to the testing methodology needs investigation This issue needs extra data and investigation ooni/probe-engine priority/high refactoring research prototype labels Feb 1, 2023

bassosimone mentioned this issue Feb 5, 2023

prototype: each experiment manages its own arguments ooni/probe-cli#1005

Draft

bassosimone closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: each experiment calls the input-fetching API it needs? #2381

engine: each experiment calls the input-fetching API it needs? #2381

bassosimone commented Dec 7, 2022

bassosimone commented Jun 12, 2023

engine: each experiment calls the input-fetching API it needs? #2381

engine: each experiment calls the input-fetching API it needs? #2381

Comments

bassosimone commented Dec 7, 2022

bassosimone commented Jun 12, 2023