Skip to content

Latest commit

 

History

History

supervised

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Supervised Training

The structure of this folder mimics that of contrastive. Here, however, supervised learning is performed to train a classifier to determine if a pair of sentence constitutes a paraphrase (label=1) or not (label=0). To train a supervised model, simply call the main script with mode==supervised:

python main.py --mode=supervised --config=Supervised_SGD

Files

File Description
learning_manager.py Learning Manager class to define the training process
predictor.py Perform inference on a dataset of sentence pairs
model_configs.py Script to write the model_configs.json
models.py Model definition and optimizer selection

Model Card

Note: The weights are not released publicly, please contact us with your desired use case via ss56pupo(at)studserv.uni-leipzig.de.

The models are sentence-transformers based on the encoder all-MiniLM-L6-v2. The underlying encoder maps sentence pairs to one 384-dimensional embedding and estimates the probability for a paraphrase based on a single linear layer.

centered image

Usage

In order to use the model, install all necessary packages featured in requirements.txt:

pip install -r requirements.txt

To apply the model, use the Predictor class provided in predictor.py. You need to provide two inputs:

python
import predictor as p
from datasets import load_dataset

dataset = load_dataset(path="glue", name="mrpc")["validation"]

Predictor = p.Predictor(model_name=model_name)
Predictor.tokenize_dataset(dataset)
logits, labels = Predictor.predict(return_logits=True, batch_size=batch_size)

Evaluation Results

The F1-scores, precision and recall values for each model can be found in the evaluation folder. The columns relate to the follow datasets which are available on request via HuggingFace:

Background

The models were developed as part of a student research project to compare the performance of contrastive learning on text alignment with that of traditional supervised learning.

Intended uses

The models are intended to be used for paraphrase detection, for instance in the text alignment subtask of text reuse identification. By default, input text longer than 256 word pieces is truncated.

Training procedure

The models were trained on a custom dataset derived from ParaBank and PAWS. All models were trained for a maximum of ten epochs (shorter training occured when validation performance did not improve). The name of each model reflects the optimizer that was used to train it.

  • Supervised_SGD: Optimizer = SGD

Hyperparameters

The values used in training are summarized in model_configs.json.