Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LP: Implement GERBIL instance for the evaluation of Link Prediction #440

Open
MichaelRoeder opened this issue Aug 1, 2023 · 0 comments
Open

Comments

@MichaelRoeder
Copy link
Member

MichaelRoeder commented Aug 1, 2023

Goal

Provide a GERBIL instance that is able to evaluate Link Prediction results.

Proposed user workflow

  1. User chooses type of evaluation
    1. Head prediction
    2. Tail prediction
    3. Relation prediction
    4. Combinations
  2. User uploads the system answer ➡️ Need to define a file format (e.g., JSON)
  3. User chooses dataset
    1. We provide a list of available datasets
    2. Uploading another dataset might be an option later on
  4. Evaluation starts
    1. Evaluation per sub task (head, tail and relation prediction)
    2. Generation of summary over the tasks, in case a combination of tasks has been selected

Development details

We prepared the LinkPrediction branch for this development.

Note: all classes and methods should get some nice Javadoc comments 😉

1. Define Experiment Types

  1. Add the 7 new experiment types to the ExperimentType enum. (This will create a lot of compiling errors, that we will have to fix) I could imagine something like the following:
HP ("Head Prediction","..."),
PP ("Predicate Prediction","..."),
TP ("Tail Prediction","..."),
HTP ("Head&Tail Prediction","..."),
HPP ("Head&Predicate Prediction","..."),
PTP ("Predicate&Tail Prediction","..."),
LP ("Link Prediction","...")
  1. Update the Experiment Type hierarchy in the equalsOrContainsType method. I would suggest to add a default: return false to the switch case statements of the old ExperimentTypes (e.g., A2KB). For the new types, you simply have to define that LP returns true for all 7 new types. HTP returns true for HP and TP and so on...
  2. Replace the names of the other experiment types in the config file with the new names. We only want to have our new experiment types in the UI.
  3. I guess there will be compiler errors, that I am simply not aware of at the moment. Just let me know in case you encounter issues that you cannot solve.

2. Define file format for result files

First, we should define the exact format of the file that the user should upload. There are several options (the example is for tail prediction).

JSON format

{[
  {
    "subject":"http://example.org/subject1",
    "predicate":"http://example.org/property",
    "predictions":[
      { "iri":"http://example.org/object1", "value":1.567 },
      { "iri":"http://example.org/object2", "value":0.58 },
    ]
  },
  {
    "subject":"http://example.org/subject2",
    "predicate":"http://example.org/property",
    "predictions":[
      { "iri":"http://example.org/object1", "value":0.15 },
      { "iri":"http://example.org/object2", "value":0.798 },
    ]
  }
]}

RDF (e.g., Turtle) with re-ification

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

[] a rdf:Statement ;
    rdf:subject <http://example.org/subject1> ;
    rdf:predicate <http://example.org/property> ;
    rdf:object <http://example.org/object1> ;
    <http://w3id.org/dice-research/gerbil-lp/value> "1.567"^^xsd:int .

The RDF variant is nice but very verbose while the JSON variant is shorter but we would have to establish our own structure there.

3. Internal representation

Internally, GERBIL works with the Document class. Each Document is typically a single task that a system has to solve. The additional information that is added to a Document is represented as a Marking. This is not exactly the best structure, but I think that it might be the easiest if we force our data into this structure than trying to rename all the classes, everywhere (I know that this is a dirty solution, but it might be the easiest for now 😉 ).

Following this structure, our document would represent a pair of IRIs for which the third IRI should be predicted. The two known IRIs would have to be stored in the document. We could put them into the text, but that doesn't sound good. Instead, we could define three classes to be able to add them as markings (in org.aksw.gerbil.datatypes.marking) (Oh noo... this is really dirty 😅):

public class Subject extends Annotation {
    // nothing to implement except the constructors of the Annotation class...
}

public class Predicate extends Annotation {
    // nothing to implement except the constructors of the Annotation class...
}

public class Object extends Annotation {
    // nothing to implement except the constructors of the Annotation class...
}

Defining them as extensions of the Annotation class simplifies our effort.

The predictions can simply be stored as instances of the ScoredAnnotation class.

Each document needs an IRI. It is used to map these documents to each other. I would suggest to have an IRI that is generated based on the two IRIs that are given for the document.

4. Provide an Annotator interface

We need to tell the framework what the system's task should be. That can be done quite easily:

  1. Define a new Interface for the link prediction:
package org.aksw.gerbil.annotator;

public interface LinkPredictor extends Annotator {

    List<ScoredAnnotation> predictLinks(Document document) throws GerbilException;
}
  1. Extend existing decorators (they are located in https://github.com/dice-group/gerbil/tree/master/src/main/java/org/aksw/gerbil/annotator/decorator). The extension is quite simple and all of them follow the same (not so nicely designed) structure. If you have issues, let me know and I will extent them.

5. Provide parsing of the answer file

Create a class that implements the LinkPredictor interface defined above. The class should take a file as input, parse it and represent the file in an internal Map. You can simply extend the existing InstanceListBasedAnnotator class. You just need to add the parsing of the file.

⚠️ This implementation should get a Junit test.

6. Provide adapters for datasets

The datasets need adapter(s) that are able to read them. Their constructor should take the path to a file. The class has to be able to load the correct answers from the file. It also needs to be able to load the triples to apply the filtering in the next step.

The filtering issue could be solved during the evaluation if we add all subjects/predicates/objects to a document that are already known from the training data. However, this would mean that we may need another Marking class, similar to the Marking classes we defined further above.

⚠️ This implementation should get a Junit test.

7. Implement filtering

If we added the subjects/predicates/objects that are already known to the Document instances of the dataset (as described as possible solution in the previous step), the filtering can simply be done as part of the evaluation, i.e., before the ScoredAnnotation instances are ranked according to their score, the annotations are filtered based on the already known triples that can be received from the Document of the dataset.

I will have to double check how exactly we can implement this part. 🤔

⚠️ This implementation should get a Junit test.

8. Implement evaluation metrics

The evaluation metric(s) should implement the Evaluator interface. Its general workflow should be the following:

  1. Create ranking based on provided scores (shared ranks in case of a tie)
  2. Calculate MRR (see appendix of https://openreview.net/pdf?id=BkxSmlBFvr)
  3. Calculate Hits@N

The results are simply instances of the DoubleEvaluationResult class.

⚠️ This implementation should get a Junit test.

Finally, the evaluation has to be put together. This is done in the EvaluatorFactory. I can do that, as soon as the previous tasks are done 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant