Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP_SEC_CH_UA as possible header to detect crawler #536

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

module17
Copy link
Contributor

@module17 module17 commented Sep 9, 2024

Requests from Facebook are coming through with various User Agent headers from the same IP address.

It has been observed that for some of these requests, usually when attempting to emulate a mobile device; the User Agent does not identify that it is coming from Facebook. The HTTP_SEC_CH_UA header does contain the same "HeadlessChrome" value as all requests do.

Sample of requests HTTP_USER_AGENT header:

"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.4 Mobile Safari/537.36"
"facebookexternalhit/1.1"
"facebookexternalhit/1.1"
"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.4 Mobile Safari/537.36"
"facebookexternalhit/1.1"
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.4 Mobile Safari/537.36"
"facebookexternalhit/1.1"
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
"meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)"
"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.4 Mobile Safari/537.36"
"facebookexternalhit/1.1"
"facebookexternalhit/1.1"
"Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Mobile/15E148 Safari/604.1"
"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.6312.4 Mobile Safari/537.36"

and matching HTTP_SEC_CH_UA lines for same requests:

"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
null
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
null
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""
"\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\""

Matching on this header will detect these requests as crawlers.

@JayBizzle
Copy link
Owner

❤️

@JayBizzle JayBizzle merged commit 240104e into JayBizzle:master Sep 14, 2024
14 checks passed
@tsukasa-mixer
Copy link

tsukasa-mixer commented Sep 28, 2024

This change now detects all mobile browsers derived from chromium as bots

This header is a complete alternative to the classic user-agent.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Sec-CH-UA

@module17
Copy link
Contributor Author

module17 commented Sep 30, 2024

This change now detects all mobile browsers derived from chromium as bots

This header is a complete alternative to the classic user-agent.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Sec-CH-UA

Can you elaborate and provide examples? Does #542 resolve your observed issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants