Integrating Arclight with Digital Content, IIIF, and ArchivesSpace
This is a python package that uses the directory structure defined in the Digital Object Discovery Storage Specification for a IIIF pipeline.
iiiflow functions create pyramidal tiffs, thumbnails, HOCR and text transcriptions, and combines them all into a IIIF v3 manifest.
In addition to python dependancies in requirements.txt
, there are some OS dependancies.
- Pyramidal tiffs requires vips
- Thumbnail generation requires ImageMagick (should probably be changed to vips)
- HOCR requires tesseract
- A/V transcriptions requires whisper
iiiflow expects a .iiiflow.yml
config file in your home directory (~
) that defines paths to the root of your Digital Object Discovery Storage, error log, and a base url for where your images are hosted.
---
discovery_storage_root: /path/to/digital_object_root
manifest_url_root: https://my.server.org
error_log_file: /path/to/errors.log
audio_thumbnail_file: ./fixtures/thumbnail.jpg
Optionally, you can pass the path to any .yml
file as the last arg of any iiiflow function.
For audio thumbnails and test to work, set audio_thumbnail_file to either a local path or accessible url to an image file.
create_ptif("collection1", "object1", "path/to/config.yml")
Creates a 300x300 thumbnail.jpg
from iiiflow import make_thumbnail
make_thumbnail("collection1", "object1")
Uses the .ptif extension to distinguish from traditional tiffs.
from iiiflow import create_ptif
create_ptif("collection1", "object1")
from iiiflow import create_hocr
create_hocr("collection1", "object1")
from iiiflow import create_transcription
create_transcription("collection1", "object1")
Validates metadata.yml using rules defined in the Digital Object Discovery Storage Specification.
from iiiflow import validate_metadata
validate_metadata("collection1", "object1")
from iiiflow import create_manifest
create_manifest("collection1", "object1")
This runs the tests with all dependancies
docker-compose run test