Skip to content
This repository has been archived by the owner on Feb 17, 2023. It is now read-only.

Bots not marked as bots #18

Open
grotos opened this issue May 11, 2015 · 3 comments
Open

Bots not marked as bots #18

grotos opened this issue May 11, 2015 · 3 comments

Comments

@grotos
Copy link

grotos commented May 11, 2015

Here is a list of UserAgent strings which are not marked as bots, but in fact they are:

"ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com"
"Apache-HttpClient/4.2.3 (java 1.5)"
"Apache-HttpClient/4.3 (java 1.5)"
"Apache-HttpClient/4.3.3 (java 1.5)"
"Application"
"CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)"
"COMODOSpider/Nutch-1.2"
"Comodo Spider 1.2"
"Comodo-Webinspector-Crawler 2.1"
"Faraday v0.8.9"
"GigablastOpenSource/1.0"
"GoogleBot 1.0"
"Google_Analytics_Snippet_Validator"
"HTTPClient/1.0 (2.3.4.1, ruby 1.9.3 (2013-06-27))"
"HTTPClient/1.0 (2.4.0, ruby 1.9.3 (2013-06-27))"
"Java/1.6.0_29"
"Java/1.6.0_45"
"Java/1.7.0_09"
"Java/1.7.0_21"
"Java/1.7.0_40"
"Java/1.7.0_60-ea"
"Java/1.7.0_65"
"Mozilla/2.0 (compatible; crw)"
"Mozilla/3.0 (compatible; Indy Library)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.2)"
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider"
"Mozilla/4.0 (compatible; Netcraft Web Server Survey)"
"Mozilla/4.0 (compatible; Synapse)"
"Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)"
"Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"
"Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36 AlexaToolbar/alxg-3.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) KomodiaBot/1.0"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
"Mozilla/5.0 (Windows NT 6.2; WOW64) Runet-Research-Crawler (itrack.ru/research/cmsrate; rating@itrack.ru)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) Survey/2.3 (fr.wsdata.com)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; )  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11)  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/0.0.0.0 Safari/536.11"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
"Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20100101 Firefox/21.0 WordPress.com mShots"
"Mozilla/5.0 (compatible; Google-Site-Verification/1.0)"
"Mozilla/5.0 (compatible; IstellaBot/1.18.81 +http://www.tiscali.it/)"
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1) (http://name911.com)"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)"
"Mozilla/5.0 (compatible; Owler/0.4; +; )"
"Mozilla/5.0 (compatible; PageAnalyzer/1.1;)"
"Mozilla/5.0 (compatible; XML Sitemaps Generator; http://www.xml-sitemaps.com) Gecko XML-Sitemaps/1.0"
"Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)"
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
"Mozilla/5.0(compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
"Mozilla/5.0(compatible;Sosospider/2.0;+http://help.soso.com/webspider.htm)"
"Porkbun/Mustache (Website Analysis; http://porkbun.com; tech@porkbun.com)"
"PycURL/7.23.1"
"Python-urllib/1.17"
"Python-urllib/2.6"
"Python-urllib/2.7"
"Python-urllib/3.4"
"Robosourcer/1.0"
"Ruby"
"Sosospider+(+http://help.soso.com/webspider.htm)"
"W3C_Validator/1.3 http://validator.w3.org/services"
"WebTarantula.com Crawler"
"Wget/1.12 (linux-gnu)"
"Wget/1.13.4 (linux-gnu)"
"WhatWeb/0.4.8-dev"
"Who.is Bot"
"WinInet Test"
"YisouSpider"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.35.0"
"ip-web-crawler.com"
"panscient.com"
"python-requests/1.1.0 CPython/2.7.4 Linux/3.8.0-19-generic"
"python-requests/1.2.0 CPython/2.7.4 Linux/3.8.0-33-generic"
"python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-24-generic"
"spotinfluence/Nutch-1.4 (Spot Influence crawler; http://spotinfluence.com; hello at spotinfluence dot com)"
"visaduhoc.info Crawler"
"wsr-agent/1.0"

@crackcomm
Copy link
Contributor

Some more data from me https://gist.github.com/crackcomm/40bad73724f14369b602
Second revision is after #26

@vodolaz095
Copy link

+1

@blixt
Copy link

blixt commented Mar 6, 2017

This one also appears to fail, possibly due to having more than one section (so bot doesn't match) and using HTTPS (the site regexp only appears to match http://...).

Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)

Here's a couple more user agents I consider bots in addition to the results from Bot():

AppEngine-Google; (+http://code.google.com/appengine; appid: s~something)
Slack-ImgProxy (+https://api.slack.com/robots)

@mssola mssola added this to the v1.0 release milestone Aug 31, 2017
megumiimai added a commit to megumiimai/user_agent that referenced this issue Oct 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants