Code für das Seminar “Information Retrieval” (siehe Seminarplan )
Inhalt | Ressourcen/Dependencies | Literatur | |
basic | Korpus, Lineare Suche | Shakespeare | IIR Kap. 1 |
boole | Term-Dokument-Matrix, Invertierter Index, Listen-Intersection, Positional Index, PositionalIntersect | IIR Kap. 1 + 2 | |
preprocess | Vorverarbeitung: Tokenisierung, Stemming | snowball stemmer | IIR Kap. 1 + 2 |
tolerant | Tolerant Retrieval: Levenshtein, Soundex | Apache Commons Lang, Apache Commons Codec | IIR Kap. 3 |
ranked | Ranked Retrieval: Termgewichtung, Vector Space Model | IIR Kap. 6 + 7 | |
evaluation | Evaluation: Precision, Recall, F-Maß | IIR Kap. 8 | |
web | Crawler, WebDocument | Apache Xerxes, Nekohtml | IIR Kap. 19 + 20 |
lucene | Lucene: Indexer und Searcher | lucene-core, lucene-queryparser, lucene-analyzers-common | Lucene in Action |