Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP response only results in garbage bytes #206

Closed
bitsgalore opened this issue Jun 21, 2018 · 5 comments
Closed

HTTP response only results in garbage bytes #206

bitsgalore opened this issue Jun 21, 2018 · 5 comments

Comments

@bitsgalore
Copy link

bitsgalore commented Jun 21, 2018

I'm trying to run the latest Heritrix build (build heritrix-3.3.0-20180529.100446-105-dist.tar.gz which I downloaded here) for some tests.

I try to start Heritrix with the below command::

~/heritrix-3.3.0-SNAPSHOT/bin/heritrix -a foo

This works, but when I open http://localhost:8443/ in my browser (Firefox), it only shows 6 garbled characters (Chromium returns a ERR_INVALID_HTTP_RESPONSE error). Saving the page and opening it in a Hex editor shows these 7 bytes:

15 03 03 00 02 02 0A

Some info on Java on my system:

openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

Here's the Heritrix log file:

Thu Jun 21 13:15:54 CEST 2018 Starting heritrix
Linux johan-HP-ProBook-640-G1 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
JAVA_OPTS= -Xmx256m
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31394
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31394
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Oracle Corporation OpenJDK Runtime Environment 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore adhoc.keystore -destkeystore adhoc.keystore -deststoretype pkcs12".
Using ad-hoc HTTPS certificate with fingerprint...
SHA1:55:BA:62:92:98:5A:DB:26:1B:08:70:D8:90:5D:9C:F3:A4:E7:BF:81
Verify in browser before accepting exception.
2018-06-21 11:15:55.239 WARNING thread-1 org.archive.crawler.framework.Engine.findJobConfigs() invalid job directory: ./jobs/.gitignore where job expected from: ./jobs/.gitignore
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
engine listening at port 8443
operator login set per command-line
NOTE: We recommend a longer, stronger password, especially if your web 
interface will be internet-accessible.
Heritrix version: 3.3.0-SNAPSHOT-2018-05-29T09:43:19Z

The log contains a number of warnings, but I have no idea if they are related to this.

Perhaps I'm doing something wrong myself (this my first attempt at installing and running Heritrix). Anyway, if anyone could give me a hint on how to make this work that would be really helpful. (Side note: I initially tried the "stable" 3.2 release, but gave up on that because of the dependency on Java 7.)

@anjackson
Copy link
Collaborator

You need to go to https://localhost:8443/ because it's only accessible over SSL. Not sure if there's an elegant way to handle this and bounce users to HTTPS automatically?

@anjackson
Copy link
Collaborator

@bitsgalore
Copy link
Author

@anjackson Thanks Andy. Turns out the docs actually mention this but I had overlooked it. Works now!

I'll close this issue now.

@guitarscape
Copy link

it seems that 3.3 does not allow http access? is there a way to enforce http (not https) so that we can use heritrix behind proxy?

@xdpirate
Copy link

xdpirate commented Jul 6, 2023

I just ran into this same problem. A HTTP to HTTPS redirect, or a note in the docs saying you must access the web UI over HTTPS would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants