Skip to content

WatchedCDXSource: Dynamically Adding CDX Indexes

Roger G. Coram edited this page May 22, 2015 · 2 revisions

OpenWayback has the ability to monitor a directory structure for the presence of CDX indexes. When OpenWayback launches, any files—which can be filtered using the filters property—found therein will be added as CDX indexes. While running, any files added to watched directories will also be added and similarly, any removed will no longer be referenced.

By default, OpenWayback will only watch a single, specified directory. An optional recursive property will watch subdirectories also.

To configure this, within the CDXCollection.xml file, within the configuration for the localcdxcollection, the source property should be changed thusly:

<property name="source">
    <bean class="org.archive.wayback.resourceindex.WatchedCDXSource">
        <property name="recursive" value="false" />
        <property name="filters">
            <list>
                <value>^.+\.cdx$</value>
            </list>
        </property>
        <property name="path" value="${wayback.basedir}/cdx-index/" />
    </bean>
</property>

Where ${wayback.basedir}/cdx-index/ is a directory where CDX files will be stored. recursive and filters are optional properties, defaulting to the above.

This is similar in functionality to the CompositeSearchResultSource, save that a list of CDX files need not be explicitly referenced, merely their parent directory.