URL Canonicalization Changed
The output of WaybackURLKeyMaker and other canonicalizers based on BasicURLCanonicalizer has changed for URLs that
contain non UTF-8 percent encoded sequences. For example when a URL contains "%C3%23" it will now be normalised to
"%c3%23" whereas previous releases produced "%25c3%23". This change brings webarchive-commons more inline with pywb,
surt (Python), warcio.js and RFC 3986. While CDX file compatibility with these newer tools should improve, note that CDX
files generated by the new release which contain such URLs may not work correctly with existing versions of
OpenWayback that use the older webarchive-commons. #102
Bug fixes
- WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length" #103
- ObjectPlusFilesOutputStream.hardlinkOrCopy now uses
Files.createLink()
instead of executingln
. This
prevents the potential for security vulnerabilities from command line option injection and improves portability.
Dependency upgrades
- fastutil removed
- dsiutils removed
Deprecations
The following classes and enum members have been marked deprecated as a step towards removal of the dependency on
Apache Commons HttpClient 3.1.
- org.archive.httpclient.HttpRecorderGetMethod
- org.archive.httpclient.HttpRecorderMethod
- org.archive.httpclient.HttpRecorderPostMethod
- org.archive.httpclient.SingleHttpConnectionManager
- org.archive.httpclient.ThreadLocalHttpConnectionManager
- org.archive.util.binsearch.impl.http.ApacheHttp31SLR
- org.archive.util.binsearch.impl.http.ApacheHttp31SLRFactory
- org.archive.util.binsearch.impl.http.HTTPSeekableLineReaderFactory.HttpLibs.APACHE_31