Skip to content

University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.

Notifications You must be signed in to change notification settings

DominikMe/multimedia-textsearch

Repository files navigation

multimedia-textsearch

Authors: Dominik Messinger, Alexander Weigl and Ge Wu
License: gpl-v3

Description

University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.

We introduce the concept of timed documents. A timed document contains the documents text sliced into blocks with time information. These documents are produce by preprocessing from audio and video files and can be stored in a XML format. The inverted index is generated upon these timed document.

Dependencies

Java Dependencies: * Apache Commons IO * Apache Commons Lang * JavaTuples * jdom * json-simple

External Dependencies: * working tesseract installation (for win32 binaries are included) * ffmpeg (for win32 binaries are included)

About

University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •