-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testlists: semi-automatically detect and remove domains that are now for sale (aka parked domains) #1826
Comments
there's this tool, but the results are not very good: https://github.com/gr3atest/excludeparked I think this definitely calls for crowdsourced curation. after a few suspicious -> verified signals, perhaps templates for common domain parkers can be identified based on page structure. |
this paper from 2015 trains a classifier with several (10+) features from the page and domain: https://www.securitee.org/files/parking-sensors_ndss2015.pdf - they use an ad-hoc random forest model, which might be impractical, but some of the features might mix well into some cheaper kind of score. |
While doing #1707, I came across domains that are now for sale.
Here are some examples:
http://makemodel.net/
http://www.kamayutmedia.com/
http://zheg.nastie.co.uk/
In general, it's not so easy to spot them, though I noticed them because they are some of the websites for which the old and the new TH report very different body sizes due to encoding.
Based on chat I had today with @agrabeli, we definitely want to remove those domains from the test lists.
The text was updated successfully, but these errors were encountered: