Skip to content

Protocols

Sebastian Nagel edited this page Dec 2, 2022 · 4 revisions

The following network protocols are implemented in StormCrawler:

File

HTTP/S

See HTTPProtocol for the effect of metadata content on protocol behaviour.

To change the implementation, add the following lines to your crawler-conf.yaml

  http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
  https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"

Feature grid

Features HTTPClient OKhttp Selenium
Basic authentication Y Y N
proxy (w. credentials?) Y / Y Y / Y ?
interruptible / trimmable #463 N / Y Y / Y Y / N
cookies Y Y N
response headers Y Y N
trust all certificates N Y N
HEAD method Y Y N
POST method N Y N
verbatim response header Y Y N
verbatim request header N Y N
IP address capture N Y N
navigation and javascript N N Y
HTTP/2 N Y (Y)
configurable connection pool N Y N

HTTP/2

Since #829 the HTTP protocol version used is configurable via http.protocol.versions (see also comments in crawler-default.yaml. Eg., to force that only HTTP/1.1 is used:

http.protocol.versions:
- "http/1.1"