Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP protocol : store response headers verbatim in metadata #317

Closed
jnioche opened this issue Jul 19, 2016 · 2 comments
Closed

HTTP protocol : store response headers verbatim in metadata #317

jnioche opened this issue Jul 19, 2016 · 2 comments

Comments

@jnioche
Copy link
Contributor

jnioche commented Jul 19, 2016

WARCRecordFormat uses the value of the metadata key _response.headers_ to include the HTTP headers in the WARC representation. The WARCTypeValue would then be 'response' instead of resource.

Similarly we'll need to store the request for [https://github.com/DigitalPebble/sc-warc/issues/1]

@jnioche jnioche added this to the 1.1 milestone Jul 19, 2016
@jnioche
Copy link
Contributor Author

jnioche commented Jul 21, 2016

Storing the request headers is not easily doable with httpclient as the user agent info is not accessible from the httpget object.

@jnioche jnioche changed the title HTTP protocol : store request and response headers verbatim in metadata HTTP protocol : store response headers verbatim in metadata Jul 21, 2016
@anjackson
Copy link

In case it's useful in the future, I think this is how Heritrix does it. i.e. it wraps the input and output streams at the socket level and records what happens so it can be picked apart afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants