From 7485c7358c4bb4a54fccaff9e70252cfcce550ff Mon Sep 17 00:00:00 2001 From: Jan Heinrich Reimer Date: Fri, 24 Nov 2023 09:18:32 +0100 Subject: [PATCH] Update documentation --- README.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2df63dd9..3f738284 100644 --- a/README.md +++ b/README.md @@ -252,7 +252,7 @@ A pointer to the WARC file is stored in the SERP index so that we can quickly ac -#### Import from AQL-22 +### Imports We support automatically importing providers and parsers from the AQL-22 YAML-file format (see [`data/selected-services.yaml`](data/selected-services.yaml)). @@ -263,6 +263,8 @@ aql providers import aql parsers url-query import aql parsers url-page import aql parsers url-offset import +aql parsers warc-query import +aql parsers warc-snippets import ``` We also support importing a previous crawl of captures from the AQL-22 file system backend: @@ -271,6 +273,12 @@ We also support importing a previous crawl of captures from the AQL-22 file syst aql captures import aql-22 ``` +Last, we support importing all archives from the [Archive-It]() web archive service: + +```shell +aql archives import archive-it +``` + ### Cluster (Helm/Kubernetes) Running the Archive Query Log on a cluster is recommended for large-scale crawls. We provide a Helm chart that automatically starts crawling and parsing jobs for you