Skip to content
This repository has been archived by the owner on Mar 1, 2020. It is now read-only.

Zshoham/Scobo-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scobo-Engine

authors: shoham zarfati, hod twito

This is our project for the course Information Retrieval

User Guide

To launch the application go to Scobo/bin there you will find runnable files for Linux and windows, to run the application on windows double click the windows_runnable.bat file, to run the application on Linux enter the command ./linux_runnable.sh

note: in order to run the application on windows java 1.8.131 must be installed, on Linux newer java versions are supported only if javafx matching the java version is installed in usr/share/openjfx/lib

After launching the application you will see a user interface that is split into two parts - Options in the upper area and actions in th lower area.

Options include :

  • corpus path - the path to a folder containing the corpus the engine will be run on, the folder must contain
    • a folder called "corpus" containing folders in which the files that will be parsed are found.
    • a file called "stop_words.txt" containing a line break separated list of words the engine should ignore
  • index path - path to a folder where the engine will create its inverted index (at the end of the run this will include three files - dictionary file, document map file, and inverted index) this folder must be able to hold data equal to about 50% of the original corpus size.
  • use stemmer - if selected the engine will use a stemmer to try and normalize the terms it creates.
  • use semantic - if selected the engine will use semantic analysis when processing a query.
  • log path - path to a file that will contain the engine run log
  • parser batch size - the number of files the engine will read from disk at once for parsing
  • save - will save the selected configuration to be loaded at the next run of the application.

Actions include :

  • run - runs the parser/indexer with the selected options.
  • reset - deletes all files the engine created in the run and clears the RAM.
  • load dictionary - loads the dictionary into memory.
  • show dictionary - opens a window that will show a table of terms and their frequencies sorted alphabetically by the terms.

Querying:

in order to run a query at least one of the fields "query", "query file path" must contain information, and a dictionary must be loaded.

  • query - free text query, any text entered here will be parsed and similar documents will be shown in a table containing document names, in the table view there are two options:
    • show entities - creates popup with the dominant entities in the selected document.
    • save resut - saves the result of the query to a specified location, the results file is saved in a format appropriate for trec_eval.
  • quert file path - the file provided will be parsed and the queries extracted and then sent to be processed, once the results are ready a popup will appear asking for a location to store the results file, once again the results are formated for use with trec_eval.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages