Skip to content

How Palmetto can be used

Michael Röder edited this page Sep 26, 2023 · 20 revisions

If you are using Palmetto for an experiment or something similar that leads to a publication, please cite the paper "Exploring the Space of Topic Coherence Measures" that you can find on the project website.

There are three different ways, how Palmetto could be used.

As web service

You only want to evaluate your topics or word sets? Then you should simply program a client for the REST interface of our web service. Requesting the coherence for a word set can be done using the URL of the form

http://palmetto.aksw.org/palmetto-webapp/service/<coherence>?words=<words>

where <words> are the space separated words and <coherence> is the name of the coherence. At the moment, the following values can be used:

  • ca
  • cp
  • cv
  • npmi
  • uci
  • umass The response contains the coherence.

If you want to request the C_P coherence for the word set "cake","apple","banana","cherry","chocolate", the URL should look like this

http://palmetto.aksw.org/palmetto-webapp/service/cp?words=cake%20apple%20banana%20cherry%20chocolate

and the response should be text/plain like

0.8696366974967441

An alternative URL that can be used is

http://palmetto.aksw.org/palmetto-webapp/service/calculate?coherence=<coherence>&words=<words>

Note that it is recommended to send GET requests because of recent problems that seemed to be caused by POST requests (#10,#11)

Python client

Thanks to Ivan Ermilov, there is a Python client available at https://github.com/dice-group/palmetto-py

Local Palmetto service

In case you want to run Palmetto locally and use its web API described above, we suggest to use the dockerized version. The second solution is to build the project locally.

Download and extract the index

In any case, you need to download and extract the index (unless you have an own index).

Using Docker image

Let's assume the index has been extracted to the path /path/to/indexes. After extraction, the directory should contain the wikipedia_bd directory and the wikipedia_bd.histogram file. If your index has a different name, you should have a look at the configuration section below.

path
+- to
  +- indexes
    +- wikipedia_bd
    +- wikipedia_bd.histogram
Execution

After that, the container can be run the following way:

docker run -p 7777:8080 -d -v /path/to/indexes/:/usr/local/indexes/:ro dicegroup/palmetto-service

After that the demo application can be accessed using http://localhost:7777/.

Configuration

Note that the default values of parameters defined in the configuration file can be overridden by using environmental variables. For increasing the number of words to 15, we add the following parameter to the command above:

-e org.aksw.palmetto.webapp.resources.AbstractCoherenceResource.maxWords=15

It should also be noted that by default, the name of the index is assumed as wikipedia_bd. If this is not the case for your index, you should set the path of the index within the Docker container including the index name:

-e org.aksw.palmetto.webapp.resources.AbstractCoherenceResource.indexPath=/usr/local/indexes/my-own-index

Compile the project

To compile the project (e.g., after adapting the code base itself), you need a JDK as well as Maven installed.

2. Clone and build repo

You need to clone this git repo and build it using Maven.

git clone https://github.com/dice-group/Palmetto.git
cd Palmetto/palmetto
mvn clean install
cd ../webApp
3. Configure the path of the index

In Palmetto/webApp/src/main/resources/palmetto.properties you should edit indexPath to point to the directory of your index.

4. Run tomcat

There are several ways to run the web service in an application server. An easy way could be to execute the following command from within the webApp directory:

mvn org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:run -Dmaven.tomcat.port=7777

After that, the service should be running. You can test it by clicking on the following link: http://localhost:7777/palmetto-webapp/. Now, you should see the demo UI.

As Java program

You would like to use Palmetto locally? No problem, it can be built as runable jar.

1. Download and extract the index

You will have to download a Lucene index containing the preprocessed Wikipedia from here. By extracting the files you should get a wikipedia_bd directory and a wikipedia_bd.histogramm file. Note that the file has to be in the same directory as the wikipedia_bd directory.

There is a Dutch index that has been created by van der Zwaan, Marx and Kamps. It can be downloaded here.

2. Download the program

You can either download the runable jar file from here or you can checkout the master branch and create it by yourself using

cd palmetto
mvn clean compile assembly:single

3. Run Palmetto

The program can be started using

java -jar palmetto-0.1.5-exec.jar <some-path>/wikipedia_bd <coherence> <topics-file>

You have to set insert the path to the wikipedia_bd directory (the program will assume that the histogramm file can be found under <some-path>/wikipedia_bd.histogramm). The two last parameters are the coherence type and a file containing your topics (see below).

Coherences

At the moment, there are 6 common coherences types that you can run directly with this jar.

  • C_A
  • C_P
  • C_V
  • NPMI
  • UCI
  • UMass

Topics file

The file containing your topics should have one single topic per line. In every line the top words of your topic are listed, separated by a single space. Your file should look like this:

company sell corporation own acquire purchase buy business sale owner
age population household female family census live average median income

Output

The jar will simply print out the topic's coherences.

You want to include Palmetto into your own project? You can check out the last stable version using

git clone -b v0.1.5 https://github.com/dice-group/Palmetto.git

install it locally using

cd Palmetto/palmetto
mvn install

and add it as a Maven dependency

  	<dependency>
  		<groupId>org.aksw</groupId>
  		<artifactId>palmetto</artifactId>
  		<version>0.1.5</version>
  	</dependency>

Another way is to download it directly from our Maven repository by adding the following lines to your project's pom file:

    <repositories>
        <repository>
            <id>maven.aksw.internal</id>
            <name>University Leipzig, AKSW Maven2 Repository</name>
            <url>https://maven.aksw.org/archiva/repository/internal</url>
        </repository>
        <repository>
            <id>maven.aksw.snapshots</id>
            <name>University Leipzig, AKSW Maven2 Repository</name>
            <url>https://maven.aksw.org/archiva/repository/snapshots</url>
        </repository>
    </repositories>

If you want to know how to use the coherence inside your source code, you should 1) read the paper to understand the parts a coherence comprises and 2) take a look into the org.aksw.palmetto.Palmetto class.