-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-lists: Create script to automatically delete expired and parked domains #1227
Comments
I am going to close this issue as a duplicate because:
Because this issue covers both cases, it is a full duplicate of those two issues. |
This pull request introduces the gardener, a tool to curate the test lists. With @sloncocs, @hellais, @agrabeli and other colleagues from OONI and Netalytica we have been working on improving the policies to update the test lists for quite some time. The tool included in this pull request helps addressing one of the easiest cases, i.e., the one where a domain inside the test list does not exist anymore. Because it is important to balance removing the domain with the fact that the domain could still be censored, this tool does not automatically remove a domain from the test lists. Rather, if you run the `dnsreport` subcommand, it just produces a CSV report that a researcher could inspect to choose whether to keep the domain. That said, the committed tool also includes a `dnsfix` subcommand that applies `dnsreport` results to the test lists and removes all entries for which we did not observe any anomaly or confirmed in the last month. This pull request touches upon several issues related to managing the test lists that we opened: * ooni/probe#1748 advocates for creating a gardener prototype, which we did here; * ooni/probe#1747 advocates for automatically removing expired domains, which it is possible to do by combining the `dnsreport` and `dnsfix` gardener subcommands; * ooni/probe#1745 advocates for creating a process for test lists maintenance, which we can now start doing thanks to the gardener tool introduced in this PR; * ooni/ooni.org#1227 advocates for creating a script to automatically remove expired and parked domains, which we start to address here by having a documented way of removing uncensored, expired domains; * ooni/ooni.org#363 is an umbrella issue about collaborating with Netalytica and writing software to make the collaboration easier, and we have done that by releasing a tool that starts moving us in the right direction and helps us to know which domains have now expired and automatically remove _some_ of them. Updating the test list is a delicate balancing exercise between removing what is now parked or expired and keeping what is still heavily censored and helps us fingerprinting censorship in a country. It took us quite some time and lots of internal and external discussion to figure out the requirements for the gardener. Now that all this discussion is finally being converted to pull requests, we should all celebrate a bit to acknowledge that this work is a stepping stone towards making the whole test lists ecosystem easier to maintain and evolve. 🥳 🥳 🥳 🥳 The related test-lists pull request is citizenlab/test-lists#1247.
This pull request introduces the gardener, a tool to curate the test lists. With @sloncocs, @hellais, @agrabeli and other colleagues from OONI and Netalytica we have been working on improving the policies to update the test lists for quite some time. The tool included in this pull request helps addressing one of the easiest cases, i.e., the one where a domain inside the test list does not exist anymore. Because it is important to balance removing the domain with the fact that the domain could still be censored, this tool does not automatically remove a domain from the test lists. Rather, if you run the `dnsreport` subcommand, it just produces a CSV report that a researcher could inspect to choose whether to keep the domain. That said, the committed tool also includes a `dnsfix` subcommand that applies `dnsreport` results to the test lists and removes all entries for which we did not observe any anomaly or confirmed in the last month. This pull request touches upon several issues related to managing the test lists that we opened: * ooni/probe#1748 advocates for creating a gardener prototype, which we did here; * ooni/probe#1747 advocates for automatically removing expired domains, which it is possible to do by combining the `dnsreport` and `dnsfix` gardener subcommands; * ooni/probe#1745 advocates for creating a process for test lists maintenance, which we can now start doing thanks to the gardener tool introduced in this PR; * ooni/ooni.org#1227 advocates for creating a script to automatically remove expired and parked domains, which we start to address here by having a documented way of removing uncensored, expired domains; * ooni/ooni.org#363 is an umbrella issue about collaborating with Netalytica and writing software to make the collaboration easier, and we have done that by releasing a tool that starts moving us in the right direction and helps us to know which domains have now expired and automatically remove _some_ of them. Updating the test list is a delicate balancing exercise between removing what is now parked or expired and keeping what is still heavily censored and helps us fingerprinting censorship in a country. It took us quite some time and lots of internal and external discussion to figure out the requirements for the gardener. Now that all this discussion is finally being converted to pull requests, we should all celebrate a bit to acknowledge that this work is a stepping stone towards making the whole test lists ecosystem easier to maintain and evolve. 🥳 🥳 🥳 🥳 The related test-lists pull request is citizenlab/test-lists#1247.
Given that the Citizen Lab test lists (https://github.com/citizenlab/test-lists/tree/master/lists) were originally created by Open Net Initiative researchers between 2008-2012, they include many URLs with expired and parked domains.
It would therefore be great if we could create a script that automatically detects and deletes URLs with expired and parked domains.
This would significantly simplify the test list review process of researchers, and it would also improve OONI measurement quality.
This activity has been included as an OONI challenge in Roskomsvoboda's DEMHACK hackathon (September 2022): https://demhack.ru/
If this activity is not implemented as part of the hackathon, the OONI team should pick it up.
[Update: 2023-03-15 - we did half of the work; please, see https://github.com/ooni/probe/issues/1826, which covers the remaining part of the work originally covered by this issue.]
The text was updated successfully, but these errors were encountered: