DAG definitions for the various workflows at Blinken OSA Archivum.
Runs on Docker, so the only prerequisite is to install Docker to your system.
- Clone the repository
- Run the
start-airflow.sh
script which compiles and runs the necessary Docker containers.
Worflow for handling the digitized files of the analogue cassettes (VHS, BETA) and create Archival Information Packages.
The workflow is defined in as a DAG, which recursively calls itself until there are files to be processed. The files for
this DAG are in the directory dags\av_tasks
:
collect_files.py
- Picking up the next file to process. The file should be named according to the barcode pattern (with the predefined extension), or should be in a directory which has the same pattern as the barcode.create_directories.py
- Creating the target directory structure for the Archival Information Package. The structure of the Archival Information Package is the following:
.
└── HU_OSA_00000001
├── Content
│ ├── Access
│ └── HU_OSA_00000001.mp4
│ └── Preservation
│ └── HU_OSA_00000001.mpg
└── Metadata
├── Access
├── HU_OSA_00000001.md5
├── HU_OSA_00000001.sha512
└── HU_OSA_00000001_md_tech.json
└── Preservation
├── HU_OSA_00000001.md5
├── HU_OSA_00000001.sha512
├── HU_OSA_00000001_md_descriptive.json
└── HU_OSA_00000001_md_tech.json
copy_master_files.py
- Moves the master files to their respective directory underAIP/Content/Preservation
create_checksums.py
- Creates checksum for the master file using md5 and sha512 algorithms.create_video_info.py
- Usingffprobe
technical metadata is created for the master file, underAIP/Metadata/Preservation/<Barcode>_md_tech.json
get_decriptive_metadata.py
- Fetches the descriptive metadata from the Archival Management System if exists.push_to_ams.py
- Using the barcode, the technical metadata is pushed to OSAs Archival Management System, and thedigital_version_exists
flag is set.encode_masters.py
- Usingffmpeg
high-quality access copies are created under:AIP/Content/Access
create_checksums.py
- Creates checksum for the access copy file using md5 and sha512 algorithms.create_video_info.py
- Usingffprobe
technical metadata is created for the master file, under `AIP/Metadata/Access/_md_tech.jsonsend_email.py
- Sending an email to the staff members of the AV digitization procedure.
To be able to pick up source and target locations, and various settings, the following settings should be made
in docker-compose.yml
- A directory containing source files should be mapped to
/opt/input
- A directory containing target files should be mapped to
/opt/output
AMS_API
- The URL of the API of the Archival Management SystemAMS_API_TOKEN
- The API token of the Archival Management System
AV_FINAL_DIR
- Directory of the target filesAV_MASTER_FILE_EXTENSION
- File extension of the source filesAV_ACCESS_FILE_EXTENSION
- File extension of the access filesAV_BARCODE_PATTERN
- Regexp pattern of the barcodes registeredAV_STAFF_EMAIL_LIST
- Comma separated list of the AV Staff email addresses