Skip to content

webarchive-commons-1.3.0

Latest
Compare
Choose a tag to compare
@ato ato released this 20 Dec 05:21
· 1 commit to master since this release

URL Canonicalization Changed

The output of WaybackURLKeyMaker and other canonicalizers based on BasicURLCanonicalizer has changed for URLs that
contain non UTF-8 percent encoded sequences. For example when a URL contains "%C3%23" it will now be normalised to
"%c3%23" whereas previous releases produced "%25c3%23". This change brings webarchive-commons more inline with pywb,
surt (Python), warcio.js and RFC 3986. While CDX file compatibility with these newer tools should improve, note that CDX
files generated by the new release which contain such URLs may not work correctly with existing versions of
OpenWayback that use the older webarchive-commons. #102

Bug fixes

  • WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length" #103
  • ObjectPlusFilesOutputStream.hardlinkOrCopy now uses Files.createLink() instead of executing ln. This
    prevents the potential for security vulnerabilities from command line option injection and improves portability.

Dependency upgrades

  • fastutil removed
  • dsiutils removed

Deprecations

The following classes and enum members have been marked deprecated as a step towards removal of the dependency on
Apache Commons HttpClient 3.1.

  • org.archive.httpclient.HttpRecorderGetMethod
  • org.archive.httpclient.HttpRecorderMethod
  • org.archive.httpclient.HttpRecorderPostMethod
  • org.archive.httpclient.SingleHttpConnectionManager
  • org.archive.httpclient.ThreadLocalHttpConnectionManager
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLR
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLRFactory
  • org.archive.util.binsearch.impl.http.HTTPSeekableLineReaderFactory.HttpLibs.APACHE_31