Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testlists: semi-automatically detect and remove domains that are now for sale (aka parked domains) #1826

Open
bassosimone opened this issue Oct 18, 2021 · 2 comments
Assignees
Labels
bug Something isn't working data quality GSoC GSoC related issues methodology issues related to the testing methodology priority/medium

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Oct 18, 2021

While doing #1707, I came across domains that are now for sale.

Here are some examples:

  1. http://makemodel.net/

  2. http://www.kamayutmedia.com/

  3. http://zheg.nastie.co.uk/

In general, it's not so easy to spot them, though I noticed them because they are some of the websites for which the old and the new TH report very different body sizes due to encoding.

Based on chat I had today with @agrabeli, we definitely want to remove those domains from the test lists.

@bassosimone bassosimone added bug Something isn't working data quality labels Oct 18, 2021
@bassosimone bassosimone self-assigned this Oct 18, 2021
@bassosimone bassosimone added GSoC GSoC related issues methodology issues related to the testing methodology labels Feb 18, 2022
@bassosimone bassosimone changed the title testlists: detect domains that are now for sale testlists: detect domains that are now for sale (aka parked domains) Apr 27, 2022
@ainghazal
Copy link

ainghazal commented Sep 27, 2022

there's this tool, but the results are not very good: https://github.com/gr3atest/excludeparked
there're some perhaps useful heuristics & experiments by someone at cisco: https://umbrella.cisco.com/blog/discovery-of-new-suspicious-domains-using-authoritative-dns-traffic-and-parked-domains-analysis

I think this definitely calls for crowdsourced curation. after a few suspicious -> verified signals, perhaps templates for common domain parkers can be identified based on page structure.

@ainghazal
Copy link

this paper from 2015 trains a classifier with several (10+) features from the page and domain: https://www.securitee.org/files/parking-sensors_ndss2015.pdf - they use an ad-hoc random forest model, which might be impractical, but some of the features might mix well into some cheaper kind of score.

@bassosimone bassosimone changed the title testlists: detect domains that are now for sale (aka parked domains) testlists: semi-automatically detect domains that are now for sale (aka parked domains) Mar 15, 2023
@bassosimone bassosimone changed the title testlists: semi-automatically detect domains that are now for sale (aka parked domains) testlists: semi-automatically detect and remove domains that are now for sale (aka parked domains) Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data quality GSoC GSoC related issues methodology issues related to the testing methodology priority/medium
Projects
None yet
Development

No branches or pull requests

2 participants