Skip to content

Releases: iipc/webarchive-commons

webarchive-commons-1.3.0

20 Dec 05:21
@ato ato
Compare
Choose a tag to compare

URL Canonicalization Changed

The output of WaybackURLKeyMaker and other canonicalizers based on BasicURLCanonicalizer has changed for URLs that
contain non UTF-8 percent encoded sequences. For example when a URL contains "%C3%23" it will now be normalised to
"%c3%23" whereas previous releases produced "%25c3%23". This change brings webarchive-commons more inline with pywb,
surt (Python), warcio.js and RFC 3986. While CDX file compatibility with these newer tools should improve, note that CDX
files generated by the new release which contain such URLs may not work correctly with existing versions of
OpenWayback that use the older webarchive-commons. #102

Bug fixes

  • WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length" #103
  • ObjectPlusFilesOutputStream.hardlinkOrCopy now uses Files.createLink() instead of executing ln. This
    prevents the potential for security vulnerabilities from command line option injection and improves portability.

Dependency upgrades

  • fastutil removed
  • dsiutils removed

Deprecations

The following classes and enum members have been marked deprecated as a step towards removal of the dependency on
Apache Commons HttpClient 3.1.

  • org.archive.httpclient.HttpRecorderGetMethod
  • org.archive.httpclient.HttpRecorderMethod
  • org.archive.httpclient.HttpRecorderPostMethod
  • org.archive.httpclient.SingleHttpConnectionManager
  • org.archive.httpclient.ThreadLocalHttpConnectionManager
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLR
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLRFactory
  • org.archive.util.binsearch.impl.http.HTTPSeekableLineReaderFactory.HttpLibs.APACHE_31

webarchive-commons-1.2.0

29 Nov 07:43
@ato ato
Compare
Choose a tag to compare

New features

  • MetaData is now multivalued to support repeated WARC and HTTP headers. #98

Dependency upgrades

  • commons-io 2.18.0
  • commons-lang 2.6
  • guava 33.3.1-jre
  • hadoop 3.4.1
  • htmlparser 2.1
  • httpcore 4.4.16
  • json 20240303
  • junit 4.13.2

webarchive-commons-1.1.11

27 Nov 13:05
@ato ato
Compare
Choose a tag to compare

Bug fixes

  • Fixed URLParser and WaybackURLKeyMaker failing on URLs with IPv6 address hostnames #100

webarchive-commons-1.1.10

15 Oct 08:46
@ato ato
Compare
Choose a tag to compare

Fixes

Dependency Upgrades

  • commons-collections 3.2.2
  • commons-io 2.14.0
  • dsiutils 2.2.8
  • guava 33.3.0-jre
  • hadoop 3.4.0 (now optional)
  • pig 0.17.0
  • org.json 20231013

Dependency Removals

  • joda-time (was unused)

webarchive-commons-1.1.9

08 May 20:28
2e8cdea
Compare
Choose a tag to compare

webarchive-commons-1.1.9 (2019-05-07)

Full Changelog

Closed issues:

  • CompressedWARCReader does not work for Common Crawl WARC files. #81
  • Fixing bad dates in WARC file #80
  • upgrade to commons-collections.jar 3.2.2 #76

Merged pull requests:

  • use commons-collections v3.2.2 to avoid v3.2.1 vulnerability #77 (ndushay)