Authors: Dominik Messinger, Alexander Weigl and Ge Wu
License: gpl-v3
University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.
We introduce the concept of timed documents. A timed document contains the documents text sliced into blocks with time information. These documents are produce by preprocessing from audio and video files and can be stored in a XML format. The inverted index is generated upon these timed document.
Java Dependencies: * Apache Commons IO * Apache Commons Lang * JavaTuples * jdom * json-simple
External Dependencies: * working tesseract installation (for win32 binaries are included) * ffmpeg (for win32 binaries are included)