Skip to content

Commit

Permalink
Adds detection for various bots (#7987)
Browse files Browse the repository at this point in the history
* Improves detection for generic bots
* Adds detection for PHP
* Improves detection for generic bots
* Improves detection for generic bots
* Adds detection for SnoopSecInspect
* Improves detection for generic bots
* Adds detection for ModatScanner
* Adds detection for researchcyber.net
* Adds detection for CrystalSemanticsBot
* Improves detection for generic bots
* Improves detection for PHP
* Adds detection for go-network
* Adds detection for najdu.s.holubem.eu
* Improves detection for Siteimprove

ref #7979
  • Loading branch information
liviuconcioiu authored Feb 6, 2025
1 parent 1bbdbfe commit c543321
Show file tree
Hide file tree
Showing 4 changed files with 152 additions and 2 deletions.
18 changes: 18 additions & 0 deletions Tests/Parser/Client/fixtures/library.yml
Original file line number Diff line number Diff line change
Expand Up @@ -749,3 +749,21 @@
type: library
name: vimeo.php
version: 3.0.8
-
user_agent: php7.4
client:
type: library
name: PHP
version: "7.4"
-
user_agent: PHP/5.3.93
client:
type: library
name: PHP
version: 5.3.93
-
user_agent: localhost.localdomain/go-network-v2.0.1
client:
type: library
name: go-network
version: 2.0.1
91 changes: 91 additions & 0 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8532,3 +8532,94 @@
producer:
name: Incsub, LLC.
url: https://incsub.com/
-
user_agent: Mozilla/5.0 (compatible)
bot:
name: Generic Bot
-
user_agent: John Recon/1.0.0
bot:
name: Generic Bot
-
user_agent: 'SPARK COMMIT: 08059e95dacafe0bf6e5782f8e2c8ec9cd8c5a17'
bot:
name: Generic Bot
-
user_agent: Mozilla/5.0 (compatible; SnoopSecInspect/1.1; +https://snoopsec.us.to/)
bot:
name: SnoopSecInspect
category: Security Checker
url: https://web.archive.org/web/20241206193253/https://snoopsec.us.to/
-
user_agent: Jesus Christ of Nazareth is LORD
bot:
name: Generic Bot
-
user_agent: masjesu
bot:
name: Generic Bot
-
user_agent: Komaru_The_Cat
bot:
name: Generic Bot
-
user_agent: Kowai/1.0
bot:
name: Generic Bot
-
user_agent: Hakai/2.0
bot:
name: Generic Bot
-
user_agent: Mozilla/5.0 (compatible; ModatScanner/1.0; +https://modat.io/)
bot:
name: ModatScanner
category: Security Checker
url: https://www.modat.io/scanning
producer:
name: Modat B.V.
url: https://www.modat.io/
-
user_agent: https://researchcyber.net/
bot:
name: researchcyber.net
category: Security Checker
url: https://web.archive.org/web/20241219082407/https://researchcyber.net/
-
user_agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CrystalSemanticsBot http://www.crystalsemantics.com/user-agent/)
bot:
name: CrystalSemanticsBot
category: Crawler
url: https://web.archive.org/web/20121230203310/http://www.crystalsemantics.com/user-agent/
producer:
name: Crystal Semantics Ltd.
url: https://web.archive.org/web/20121029062239/http://www.crystalsemantics.com/
-
user_agent: LoliSec/2.0
bot:
name: Generic Bot
-
user_agent: LMAO/2.0
bot:
name: Generic Bot
-
user_agent: Vyhledavac sluzeb | hlavicka smerovani | najdu.s.holubem.eu
bot:
name: najdu.s.holubem.eu
category: Crawler
url: https://najdu.s.holubem.eu/
-
user_agent: Vyhledavac sluzeb | hlavicka | najdu.s.holubem.eu
bot:
name: najdu.s.holubem.eu
category: Crawler
url: https://najdu.s.holubem.eu/
-
user_agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.0 Safari/534.34 Siteimprove (Accessibility)
bot:
name: Siteimprove
category: Search bot
url: https://siteimprove.com/
producer:
name: Siteimprove GmbH
url: https://siteimprove.com/
35 changes: 33 additions & 2 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2366,7 +2366,7 @@
name: 'WooRank sprl'
url: 'https://www.woorank.com/'

- regex: 'by Siteimprove\.com'
- regex: 'Siteimprove'
name: 'Siteimprove'
category: 'Search bot'
url: 'https://siteimprove.com/'
Expand Down Expand Up @@ -4955,8 +4955,39 @@
name: 'Incsub, LLC.'
url: 'https://incsub.com/'

- regex: 'SnoopSecInspect'
name: 'SnoopSecInspect'
category: 'Security Checker'
url: 'https://web.archive.org/web/20241206193253/https://snoopsec.us.to/'

- regex: 'ModatScanner'
name: 'ModatScanner'
category: 'Security Checker'
url: 'https://www.modat.io/scanning'
producer:
name: 'Modat B.V.'
url: 'https://www.modat.io/'

- regex: 'researchcyber\.net'
name: 'researchcyber.net'
category: 'Security Checker'
url: 'https://web.archive.org/web/20241219082407/https://researchcyber.net/'

- regex: 'CrystalSemanticsBot'
name: 'CrystalSemanticsBot'
category: 'Crawler'
url: 'https://web.archive.org/web/20121230203310/http://www.crystalsemantics.com/user-agent/'
producer:
name: 'Crystal Semantics Ltd.'
url: 'https://web.archive.org/web/20121029062239/http://www.crystalsemantics.com/'

- regex: 'najdu\.s\.holubem\.eu'
name: 'najdu.s.holubem.eu'
category: 'Crawler'
url: 'https://najdu.s.holubem.eu/'

# Generic bots
- regex: 'nuhk|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr\.com|teoma|oegp|http%20client|htdig|mogimogi|larbin|scrubby|searchsight|semanticdiscovery|snappy|vortex(?!(?: Build|Plus| CM62| HD65))|zeal(?!ot)|dataparksearch|findlinks|BrowserMob|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|7Siters|centuryb\.o\.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|cortex|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|Test Certificate Info|iplabel|Magellan|TheSafex?Internetx?Search|Searcherx?web|kirkland-signature|LinkChain|survey-security-dot-txt|infrawatch|Time/|r00ts3c-owned-you|nvdorz|Root Slut|NiggaBalls|BotPoke|GlobalWebSearch|xx032_bo9vs83_2a|sslshed|geckotrail|Wordup|Keydrop|^xenu|^(?:chrome|firefox|Abcd|Dark|KvshClient|Node.js|Report Runner|url|Zeus|ZmEu)$'
- regex: 'nuhk|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr\.com|teoma|oegp|http%20client|htdig|mogimogi|larbin|scrubby|searchsight|semanticdiscovery|snappy|vortex(?!(?: Build|Plus| CM62| HD65))|zeal(?!ot)|dataparksearch|findlinks|BrowserMob|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|7Siters|centuryb\.o\.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|cortex|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|Test Certificate Info|iplabel|Magellan|TheSafex?Internetx?Search|Searcherx?web|kirkland-signature|LinkChain|survey-security-dot-txt|infrawatch|Time/|r00ts3c-owned-you|nvdorz|Root Slut|NiggaBalls|BotPoke|GlobalWebSearch|xx032_bo9vs83_2a|sslshed|geckotrail|Wordup|Keydrop|\(compatible\)|John Recon|SPARK COMMIT|masjesu|Komaru_The_Cat|Jesus Christ of Nazareth is LORD|Kowai|Hakai|LoliSec|LMAO|^xenu|^(?:chrome|firefox|Abcd|Dark|KvshClient|Node.js|Report Runner|url|Zeus|ZmEu)$'
name: 'Generic Bot'

# Generic detections
Expand Down
10 changes: 10 additions & 0 deletions regexes/client/libraries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -664,3 +664,13 @@
name: 'vimeo.php'
version: '$1'
url: 'https://github.com/vimeo/vimeo.php'

- regex: '^PHP/?(\d+[.\d]+)'
name: 'PHP'
version: '$1'
url: ''

- regex: 'go-network-v(\d+[.\d]+)'
name: 'go-network'
version: '$1'
url: ''

0 comments on commit c543321

Please sign in to comment.