The AnnotationTool is a tool facilitating the work at the Nichtstaatliches Recht der Metallindustrie project at the Max Planck Institute for Legal History and Legal Theory ( It's current functions are:
- Splitting a pdf into single image files, which can be worked on further with other tools such as ScanTailor Advanced (
- Uploading the images to a server via ssh. In our project, we use ocr4all ( to do the OCR work.
- Download the OCR data from a server via ssh. Attention: The program will try to execute a shell script located on said server, which is not provided here. You should disable that part of the code if you want to use the application yourself.
- Automatically annotate TEI XML files according to a list of signal words for respective tags.
- Provide a manual transcription interface to work on sources which cannot be processed with ocr4all (e.g. handwritten sources). Attention: The script uses a template for the TEI XML file which contains the header of the project. You need to change the template file /source/ according to your own needs.
Windows: Linux: Not available yet. MacOS: Not available yet.
Windows: After installing dependencies, pyinstaller.exe --clean -F source\
would build a single executabl file.
- add @resp/@cert attributes for automated annotations
- tag words containing linebreaks
You can find example files from the project here:
Feel free to give comments and/or suggestions here or privately to