-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine: each experiment calls the input-fetching API it needs? #2381
Labels
funder/drl2022-2024
methodology
issues related to the testing methodology
needs investigation
This issue needs extra data and investigation
ooni/probe-engine
priority/high
refactoring
research prototype
Comments
bassosimone
added
funder/drl2022-2024
methodology
issues related to the testing methodology
needs investigation
This issue needs extra data and investigation
ooni/probe-engine
priority/high
refactoring
research prototype
labels
Feb 1, 2023
bassosimone
added a commit
to ooni/2023-05-richer-input
that referenced
this issue
Jun 12, 2023
The ooni/probe#2381 issue advocates for each experiment _fetching_ its own targets through the correct API. This repository assumes that we have all the targets in a single unified structure provided by the check-in v2 API. However, the code in this repository assumes that targets are opaque and each experiment could handle different targets. Consider this sentence from the above-mentioned issue: > we actually have one single kind of experiment, the one which fetches > its own input, formatted according to the input type understood by the > experiment, and processes it accordingly. It is the most important sentence of the issue. The original focus was on "fetching its own input". However, rereading the sentence after a few months, it seems the important concept was actually allowing each experiment to _handle_ its own targets (we called targets "inputs" in the issue, because we have not had a chance to spell out the difference between inputs and targets). This is the reason why I think the work done in this repository has helped to explore the design space described in the issue above.
We invested significant effort to write a prototype of this issue in ooni/probe-cli#1005. While the prototype itself could still be useful, because it contains bits of code that it's worth merging into master, the issue in itself can now be considered complete. The ooni/2023-05-richer-input@9a0e0ed commit explains the reason why I think that's the case. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
funder/drl2022-2024
methodology
issues related to the testing methodology
needs investigation
This issue needs extra data and investigation
ooni/probe-engine
priority/high
refactoring
research prototype
This (currently-work-in-progress) issue describes an alternative solution for passing richer input to experiments that perhaps is conducive to less complexity inside the core of ooniprobe and more flexibility in terms of data formats.
Let's kick off our discussion by observing that we have N possible input formats already (Web Connectivity, Psiphon, DNSCheck, Tor) and M formats (Web Connectivity is actually also used by urlgetter and would be used by websteps, Psiphon is only used by Psiphon, DNSCheck only by DNSCheck, and Tor only by Tor). Additionally in a run-by-command-line or OONI Run v2 scenario, some orthogonal input is provided by either command line settings or the OONI Run v2 descriptor.
This situation has led us to (1) converge on the minimum denominator for passing inputs to experiments (i.e., strings containing URLs) and (2) using additional mechanisms for providing inputs to experiments where the input would not fit this model (think, e.g., at how Psiphon and Tor download their own input and have no strings-based input).
What this situation is telling us, though, is that we actually have one single kind of experiment, the one which fetches its own input, formatted according to the input type understood by the experiment, and processes it accordingly. Obviously, even if we did that, we would have other bottleneck places where experiments assume string input (e.g., the database format). Yet, if this would be possible, we could reduce the ~complex way in which experiments run with or without input to another model where each experiment does the right thing for itself (which, in cases such as Telegram, is to actually not have input).
Now, this discussion makes sense conceptually but changing the code to behave as described may or may not be quite difficult to do. I am not super sure. Hence this issue. We want to explore the design space and work on small prototypes to understand whether this (in my opinion desirable) design change is doable or too hard given the current codebase.
The text was updated successfully, but these errors were encountered: