Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider upgrading heritrix 1.x dependency from 1.14.1 (adjusted) to 1.14.4 #100

Open
kurtlenfesty opened this issue Feb 19, 2019 · 0 comments

Comments

@kurtlenfesty
Copy link
Contributor

The current heritrix dependency for harvest-agent-h1, as per its pom.xml:

		<dependency>
			<groupId>org.archive</groupId>
			<artifactId>aheritrix</artifactId>
			<version>1.14.1</version>
			<scope>compile</scope>
		</dependency>

but once the replace-jboss-soap-with-latest-jaxws-rt is merged into the master branch, that dependency will change to:

		<dependency>
			<groupId>org.archive</groupId>
			<artifactId>heritrix</artifactId>
			<version>1.14.2-webcuratortool-2.0.1</version>
			<scope>compile</scope>
		</dependency>

This version of heritrix is based on the 1.14.1 version source code downloaded from sourceforge, specifically:
https://sourceforge.net/projects/archive-crawler/files/archive-crawler%20%28heritrix%201.x%29/1.14.1/

and then packaged into a source code repository with bug fixes and dependency changes:
https://github.com/WebCuratorTool/heritrix-1-14-adjust

which also requires an updated commons-httpclient and commons-pool:
https://github.com/WebCuratorTool/commons-httpclient-heritrix-1-14
https://github.com/WebCuratorTool/commons-pool-heritrix-1-14

There is a more recent version of heritrix available on sourceforge, 1.14.4:
https://sourceforge.net/projects/archive-crawler/files/archive-crawler%20%28heritrix%201.x%29/1.14.4/

We will assume that this version contains some bug fixes and optimisations, which can be discovered through a code compare of the sources. There are about 120 source files that have changed between heritrix 1.14.1 and heritrix 1.14.4.

We may decide that we want to upgrade the Heritrix 1 version to include the changes up to 1.14.4, which would involve merging the changes in the 1.14.4 source with the changes made in the https://github.com/WebCuratorTool/heritrix-1-14-adjust code repository.

This may become a moot point if the Web Curator Tool drops support for Heritrix 1.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant