Skip to content

Mask Playwright's "headless" headers #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vdusek opened this issue Aug 6, 2024 · 0 comments · Fixed by #545
Closed

Mask Playwright's "headless" headers #401

vdusek opened this issue Aug 6, 2024 · 0 comments · Fixed by #545
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vdusek
Copy link
Collaborator

vdusek commented Aug 6, 2024

Description

  • When running headless browsers with Playwright, certain HTTP headers (like User-Agent and Sec-Ch-Ua) reveal the browser is operating in headless mode by including the substring "headless". This makes it easy for anti-scraping systems to detect and block automated browsers.
  • To prevent detection, these headers should be replaced with more realistic, browser-like values that mimic typical, non-headless browser behavior.
  • Example of header values for headless Chromium:
{
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Host": "httpbin.org",
    "Priority": "u=0, i",
    "Sec-Ch-Ua": "\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\"",
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": "\"Linux\"",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/128.0.6613.18 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-66d04117-141b301674c02e4e2136f1f1"
}

Relevant links

@vdusek vdusek added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Aug 6, 2024
@vdusek vdusek self-assigned this Aug 26, 2024
@vdusek vdusek added this to the 97th sprint - Tooling team milestone Aug 26, 2024
vdusek added a commit that referenced this issue Sep 17, 2024
### Description

- This is the first version of the header generator, providing common
HTTP headers including user agent.
- User-agent is picked randomly from a random pool of 1000 user agents
from the Apify fingerprint dataset.
- This is integrated into the HTTPX client and will be further used in
the Playwright fingerprint injector (#401).

### Issues

- Closes: #402 

### Testing

- New unit tests implemented.

### Checklist

- [x] CI passed
@vdusek vdusek changed the title Add fingerprint injector for Playwright crawler Add fingerprint injector and integrate it into Playwright crawler Sep 17, 2024
vdusek added a commit that referenced this issue Sep 17, 2024
vdusek added a commit that referenced this issue Sep 17, 2024
vdusek added a commit that referenced this issue Sep 17, 2024
vdusek added a commit that referenced this issue Sep 17, 2024
vdusek added a commit that referenced this issue Sep 19, 2024
vdusek added a commit that referenced this issue Sep 23, 2024
vdusek added a commit that referenced this issue Sep 25, 2024
vdusek added a commit that referenced this issue Sep 26, 2024
vdusek added a commit that referenced this issue Sep 26, 2024
@vdusek vdusek changed the title Add fingerprint injector and integrate it into Playwright crawler Mask Playwright's "headless" headers Sep 26, 2024
vdusek added a commit that referenced this issue Sep 26, 2024
vdusek added a commit that referenced this issue Sep 26, 2024
@vdusek vdusek closed this as completed in d1445e4 Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant