-
Notifications
You must be signed in to change notification settings - Fork 260
Protocols
Sebastian Nagel edited this page Dec 2, 2022
·
4 revisions
The following network protocols are implemented in StormCrawler:
See HTTPProtocol for the effect of metadata content on protocol behaviour.
To change the implementation, add the following lines to your crawler-conf.yaml
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
Features | HTTPClient | OKhttp | Selenium |
---|---|---|---|
Basic authentication | Y | Y | N |
proxy (w. credentials?) | Y / Y | Y / Y | ? |
interruptible / trimmable #463 | N / Y | Y / Y | Y / N |
cookies | Y | Y | N |
response headers | Y | Y | N |
trust all certificates | N | Y | N |
HEAD method | Y | Y | N |
POST method | N | Y | N |
verbatim response header | Y | Y | N |
verbatim request header | N | Y | N |
IP address capture | N | Y | N |
navigation and javascript | N | N | Y |
HTTP/2 | N | Y | (Y) |
configurable connection pool | N | Y | N |
- the OKHttp protocol supports HTTP/2 if the JDK includes ALPN (Java 9 and upwards or Java 8 builds starting early/mid 2020).
- HttpClient does not yet support HTTP/2
- Selenium: whether HTTP/2 is used or not depends on the used driver
Since #829 the HTTP protocol version used is configurable via http.protocol.versions
(see also comments in crawler-default.yaml. Eg., to force that only HTTP/1.1 is used:
http.protocol.versions:
- "http/1.1"
- Start
- Components
- Filters
- Bolts
- Protocol
- Metadata
- Resources