-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request to Carefully Look Through The Domains #362
Comments
Skimming the list, I found some trackers, crappy hosts, ad domains, popup porn ads, subdomains of main domains blocked for good reason and co. Analyzing this list would take days to weeks. At the moment I don't have the time ... |
Interesting dataset - might be handy for TLD discovery but other than that it's almost impossible to analyze/test |
Given scant interest and negative feedback, I will look through the list myself to find out 'a few' FPs based on my mood. Stay Turned. |
Thanks for the help. |
Please re-open it. I haven't finished it. |
I even haven't finished those beginning with 'a'. |
Ok, sorry. You can also post the domains in this issue, no need to open a new one for each. But, as you like ... |
Some numbers for those interested:
Of course being part of the top 1 million most visited websites doesn't mean that it's a legit domain, so be careful with jumping to conclusions. I applaud your efforts, by the way. Maybe a small subset of this list can be used to check whether a list is fit for inclusion, same as what oisd does. For example, some legit looking webshops are loaded from the NoTracking list, which in turn got them from here, which with all due respect looks like a pretty obscure and not frequently updated list. That raises the question whether NoTracking has a strict enough inclusion policy (and in return HaGeZi as well). |
Thanks for the advice, I'll see how I can get a handle on this. @notracking: How do you see that? I would think about removing the source mentioned. |
@sr093906 |
I hope the adjustments will save me much time. |
@sr093906 "cleaned" Ultimate is online. Should find less now ... |
@hagezi Thanks for notification. I will continue the check later. https://github.com/MISP/misp-warninglists/ Whitelist resources. Maybe some lists will be helpful. |
@sr093906 I've done more cleanup, the build is running now and will be through in a few hours. I'll let you know ... |
@sr093906 Update is live, cleaned pro to ultimate. FYI: Toplists: https://github.com/hagezi/dns-data-collection/tree/main/top
|
Thanks for letting me know. I will check. |
@sr093906 STOP posting potential phishing domains to whitelist, check the phishing sources and report them there. If they are removed from the phishing lists, they disappear from my lists too! Thanks, |
@sr093906 Please spare me with these Chrome-Toplist Crap sites from the lower ranks, I use for my TIF the Umbrella/Tranco Toplist as Whitelist. So the hosts you reported are not on either toplist if they are blocked by my TIF. Report them upstream if you think they are false positives. Thanks, |
I have now spent hours on these issues. I cleaned up the lists using the Chrome Toplist. Everything that was safe to remove was removed. Done. |
Well, Stonecrushers list is basically a scraped version of: Though I will remove/disable it because it should have (at least) excluded their "Problematische Online-Shops" list, which mostly has shops with bad service (based on user reports). |
Thanks! |
https://github.com/zakird/crux-top-lists
So, please treat them as domains visited by real human. And based on such an assumption, not a few can/should be treated as FPs.
The list is generated by downloading the repo's latest csv file and stripping http:// and https://
After that, entries seen in Fake, Threat Intelligence Feeds, DoH/VPN/TOR/Proxy Bypass (complete edition), Safesearch not supported, Dynamic DNS, Badware Hoster and Personal are removed.
Finally, common entries between the processed file and the raw domain version of ultimate blacklist are listed.
There are some bet and porn sites there, of course. For others, some are clearly FPs like those staring with blog., login, and others.
5.txt
The text was updated successfully, but these errors were encountered: