kpe

This project is the successor of the keyphrase extractor system SZTERGAK, which participated at the SemEval shared task on keyphrase extraction (from scientific publications).

###How to make the code running

The fast lane
Enter the subsequent commands in the project directory (assuming a UNIX-like environment and the accessability of zip and ant commands):

./getLibs.sh
ant
ant KpeMainNoTraining

The (not so) fast lane

Running the script getLibs downloads dependant libs necessary for building the project. Note that this script assumes a UNIX-like envorinment (and the accessability of the zip command). This step takes some time depending on your Internet connection (as it downloads 210Mb of data approximately). Open a terminal and in the project directory type:
./getLibs.sh
Having done this, all the necessary libs can be found in the directory lib/.
You are now ready to compile the project in your favorite IDE or by using ant for that purpose. In case ant is available on your machine, compilation can be performed by executing ant in the project directory.
When running the project, you can decide to create a new model or not. The two options are accessible by ant KpeMain and ant KpeMainNoTraining configurations.
Details of the behavior of the main class is influenced by the contents of the config.txt and the config_no_training.txt files. The former file is decorated with comments (text following //), which are intended to explain the various parameters that can be set within the config files.

####The structure of the config file Comments at the end of every line of file config.txt are intended to explain the structure of config files.
The only part not being detailed there, is how to encode feature combinations into integers, i.e. the format in which the framework expects them.
In order to encode a feature combination into an integer, open the plain text file resources/features, select the rows which contain the names of the desired features and simply add the integers up next to them to get the integer describing the selected set of features.
Using feature encoding with value 1060891 (being equal to 1+2+8+16+4096+8192+1048576, hence encoding features referred as WikiFeature, TfIdfFeature, SuffixFeature, StrangeOrthographyFeature, PosFeature, MweFeature, FirstIndexFeature in the resources/features file) is a reasonable choice of features due to our experiences over various domains.

####Writing custom readers This code was primarily written for dealing with the SemEval shared task dataset. However, arbitrary readers can be added to the project by implementing the hu.u_szeged.kpe.readers.KpeReader interface in a similar manner as hu.u_szeged.kpe.readers.SemEvalReader or hu.u_szeged.kpe.readers.GeneralReader does it so.

###Related publications Gábor Berend: Opinion Expression Mining by Exploiting Keyphrase Extraction. Fifth International Joint Conference on Natural Language Processing. PDF
Gábor Berend; Richárd Farkas: Feature Engineering for Keyphrase Extraction, accepted to SemEval-2 workshop, Evaluation Exercises on Semantic Evaluation - ACL SigLex event 2010. [PDF] (http://www.aclweb.org/anthology/S/S10/S10-1040.pdf)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
example_data		example_data
lib		lib
models/GeneralReader		models/GeneralReader
resources		resources
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.xml		build.xml
config.txt		config.txt
config_no_training.txt		config_no_training.txt
getLibs.sh		getLibs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kpe

About

Releases

Packages

Languages

License

begab/kpe

Folders and files

Latest commit

History

Repository files navigation

kpe

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages