WAVA Tool

Web Archive Visualization and Analysis (WAVA) Tool is the standalone version of the Harvest Visualization and Analysis feature in the WebCurator Tool (WCT). It has been customized to run outside of the WCT, but still depends on core WCT code that keeps the projects in sync. WAVA Tool does not include the Patching functionality that the WCT feature has.

The WAVA Tool is intended to be used with harvests/crawls of single websites or harvests/crawls of a small set of websites. It is not designed to work with domain crawls or similar large scale harvest/crawls.

Requirements

Apache Maven 3+ (required to build from source)
Gradle 5.6+ (required to build from source)
Java - WAVA Tool has been developed and testing on Java 8
WebCurator Tool project - WAVA Tool requires the webcurator-core dependency from the WebCurator Tool project to be compiled and installed in the local maven repository

Configration

Configuration for the tool is set in application.properties. Located in the project here src/main/resources/application.properties

Common properties to configure:

Configure the port the tool runs on. This is the port used to access the tool in your browser.

server.port=8080
Configure the directory the tool loads harvests from

arcDigitalAssetStoreService.baseDir=/app/harvests
Configure the URL for linking to a Wayback instance of a harvest

harvestResourceUrlMapper.urlMap=http://localhost:8090/wayback/*/

Build

From the project directory run:

./gradlew clean install

The project binary is built in build/libs/

Harvest Setup

The WAVA tool can access harvests located in the directory specified in the arcDigitalAssetStoreService.baseDir property.

Place each harvest in its own sub-folder within the base directory.

E.g.

app/harvests/
|-- website_1/
  |-- website_1_file_1.warc
  |-- website_1_file_2.warc
|-- website_2/
  |-- website_2_file_1.warc
  |-- website_2_file_2.warc

Run

java -jar wava-tool-<version>.war

Open localhost:8080 in your web browser

Limitations

WAVA Tool is generally limited by the resources available. Two scenarios can cause slowness or unresponsiveness

A harvest with a large number of URLs (5000+) within a small number of domains can cause unresponsiveness within the URL views, paricularly when filtering.
A harvest with a large number of domains can cause the visualization to be slow to draw and interact with.

Documentation

See the WebCurator Tool documentation for details

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
expand.png		expand.png
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WAVA Tool

Requirements

Configration

Build

Harvest Setup

Run

Limitations

Documentation

About

Releases 1

Packages

Contributors 2

Languages

License

WebCuratorTool/wava-tool

Folders and files

Latest commit

History

Repository files navigation

WAVA Tool

Requirements

Configration

Build

Harvest Setup

Run

Limitations

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages