-
Notifications
You must be signed in to change notification settings - Fork 36
How Palmetto can be used
If you are using Palmetto for an experiment or something similar that leads to a publication, please cite the paper "Exploring the Space of Topic Coherence Measures" that you can find on the project website.
There are three different ways, how Palmetto could be used.
You only want to evaluate your topics or word sets? Then you should simply program a client for the REST interface of our web service. Requesting the coherence for a word set can be done using the URL of the form
http://palmetto.aksw.org/palmetto-webapp/service/<coherence>?words=<words>
where <words>
are the space separated words and <coherence>
is the name of the coherence. At the moment, the following values can be used:
ca
cp
cv
npmi
uci
-
umass
The response contains the coherence.
If you want to request the C_P coherence for the word set "cake","apple","banana","cherry","chocolate", the URL should look like this
http://palmetto.aksw.org/palmetto-webapp/service/cp?words=cake%20apple%20banana%20cherry%20chocolate
and the response should be text/plain
like
0.8696366974967441
An alternative URL that can be used is
http://palmetto.aksw.org/palmetto-webapp/service/calculate?coherence=<coherence>&words=<words>
Note that it is recommended to send GET
requests because of recent problems that seemed to be caused by POST
requests (#10,#11)
Thanks to Ivan Ermilov, there is a Python client available at https://github.com/dice-group/palmetto-py
In case you want to run Palmetto locally and use its web API described above, we suggest to use the dockerized version. The second solution is to build the project locally.
In any case, you need to download and extract the index (unless you have an own index).
Let's assume the index has been extracted to the path /path/to/indexes
. After extraction, the directory should contain the wikipedia_bd
directory and the wikipedia_bd.histogram
file. If your index has a different name, you should have a look at the configuration section below.
path
+- to
+- indexes
+- wikipedia_bd
+- wikipedia_bd.histogram
After that, the container can be run the following way:
docker run -p 7777:8080 -d -v /path/to/indexes/:/usr/local/indexes/:ro dicegroup/palmetto-service
After that the demo application can be accessed using http://localhost:7777/
.
Note that the default values of parameters defined in the configuration file can be overridden by using environmental variables. For increasing the number of words to 15, we add the following parameter to the command above:
-e org.aksw.palmetto.webapp.resources.AbstractCoherenceResource.maxWords=15
It should also be noted that by default, the name of the index is assumed as wikipedia_bd
. If this is not the case for your index, you should set the path of the index within the Docker container including the index name:
-e org.aksw.palmetto.webapp.resources.AbstractCoherenceResource.indexPath=/usr/local/indexes/my-own-index
To compile the project (e.g., after adapting the code base itself), you need a JDK as well as Maven installed.
You need to clone this git repo and build it using Maven.
git clone https://github.com/dice-group/Palmetto.git
cd Palmetto/palmetto
mvn clean install
cd ../webApp
In Palmetto/webApp/src/main/resources/palmetto.properties
you should edit indexPath
to point to the directory of your index.
There are several ways to run the web service in an application server. An easy way could be to execute the following command from within the webApp
directory:
mvn org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:run -Dmaven.tomcat.port=7777
After that, the service should be running. You can test it by clicking on the following link: http://localhost:7777/palmetto-webapp/. Now, you should see the demo UI.
You would like to use Palmetto locally? No problem, it can be built as runable jar.
You will have to download a Lucene index containing the preprocessed Wikipedia from here. By extracting the files you should get a wikipedia_bd
directory and a wikipedia_bd.histogramm
file. Note that the file has to be in the same directory as the wikipedia_bd
directory.
There is a Dutch index that has been created by van der Zwaan, Marx and Kamps. It can be downloaded here.
You can either download the runable jar file from here or you can checkout the master branch and create it by yourself using
cd palmetto
mvn clean compile assembly:single
The program can be started using
java -jar palmetto-0.1.5-exec.jar <some-path>/wikipedia_bd <coherence> <topics-file>
You have to set insert the path to the wikipedia_bd
directory (the program will assume that the histogramm file can be found under <some-path>/wikipedia_bd.histogramm
).
The two last parameters are the coherence type and a file containing your topics (see below).
At the moment, there are 6 common coherences types that you can run directly with this jar.
C_A
C_P
C_V
NPMI
UCI
UMass
The file containing your topics should have one single topic per line. In every line the top words of your topic are listed, separated by a single space. Your file should look like this:
company sell corporation own acquire purchase buy business sale owner
age population household female family census live average median income
The jar will simply print out the topic's coherences.
You want to include Palmetto into your own project? You can check out the last stable version using
git clone -b v0.1.5 https://github.com/dice-group/Palmetto.git
install it locally using
cd Palmetto/palmetto
mvn install
and add it as a Maven dependency
<dependency>
<groupId>org.aksw</groupId>
<artifactId>palmetto</artifactId>
<version>0.1.5</version>
</dependency>
Another way is to download it directly from our Maven repository by adding the following lines to your project's pom file:
<repositories>
<repository>
<id>maven.aksw.internal</id>
<name>University Leipzig, AKSW Maven2 Repository</name>
<url>https://maven.aksw.org/archiva/repository/internal</url>
</repository>
<repository>
<id>maven.aksw.snapshots</id>
<name>University Leipzig, AKSW Maven2 Repository</name>
<url>https://maven.aksw.org/archiva/repository/snapshots</url>
</repository>
</repositories>
If you want to know how to use the coherence inside your source code, you should 1) read the paper to understand the parts a coherence comprises and 2) take a look into the org.aksw.palmetto.Palmetto
class.