You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
The docs never mention that only http proxies are supported. I think using http proxies are a security risk. Digging deeper you end up in here which crawlee uses. I think it should support HTTPS proxies as well.
Code sample
constproxyConfiguration=newProxyConfiguration({proxyUrls: ['http://Username:Password@proxyUrl:PORT',],});constcrawler=newPlaywrightCrawler({
proxyConfiguration,// Use the requestHandler to process each of the crawled pages.asyncrequestHandler({request, page, enqueueLinks, log, crawler}){consttitle=awaitpage.title();content=awaitpage.content();log.info(`Title of ${request.loadedUrl} is '${title}'`);// Save results as JSON to ./storage/datasets/defaultawaitDataset.pushData({title,url: request.loadedUrl, content});// Extract links from the current page// and add them to the crawling queue.awaitenqueueLinks();},maxRequestsPerCrawl: 1,maxConcurrency: 20,retryOnBlocked: true,maxRequestRetries: 10,},newConfiguration({persistStorage: false,maxUsedCpuRatio: 0.95,availableMemoryRatio: 0.5,}),);awaitcrawler.run([url])
Package version
crawlee@3.11.0 proxy-chain@2.5.1
Node.js version
v20.10.0 typescript@5.5.2
Operating system
macOS
Apify platform
Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
Hello - and thank you for your interest in this project.
Can you please provide reproduction scenario for the issue you are having?
"I think using http proxies are a security risk"
Note that this is not true - if you are connecting to the target server via HTTPS, the traffic is still end-to-end encrypted. With HTTP proxies, this is achieved via HTTP CONNECT method, which creates an opaque data tunnel from the client to the proxy server, through which the encrypted data is transferred. The intermediate proxy server cannot read this data (as it's encrypted).
If you are connecting to an HTTP target server (or you decide to fiddle around with the TLS settings - see e.g. comments under this issue), the proxy can indeed act as MITM and read your traffic - but you really have to want this - it will never happen with the default
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
The docs never mention that only
http
proxies are supported. I think using http proxies are a security risk. Digging deeper you end up in here which crawlee uses. I think it should support HTTPS proxies as well.Code sample
Package version
crawlee@3.11.0 proxy-chain@2.5.1
Node.js version
v20.10.0 typescript@5.5.2
Operating system
macOS
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
The text was updated successfully, but these errors were encountered: