- This repository contains the source code for the paper "ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining" to appeat at COLING 2022
- To request the annotations of both summaries and articles with argument roles , please contact Dr. Kevin D. Ashley (ashley@pitt.edu). However, you must first obtain the unannotated data through an agreement with the Canadian Legal Information Institute (CanLII) (https://www.canlii.org/en/)
- The argument classification uses by default Legalbert while the Document summarization uses by default the Logformer Encoder-Decoder.
- transformers
- pytorch
- pylightining for training argument classifier.
- SummEval [link]
- training script [link]
- generation script [link]
- notice that you can easily choose the model and modify input and summary length through the config file without the need to modify much in the training scripts.
The special tokens used to highlight the argument roles in our data , they are split into two groups
- Note that we made our best predictions on the test set obtained by the model available to use.
- predictions [link]
If you are going to follow up on this project please cite this work using the following bibtext:*
@inproceedings{elaraby-litman-2022-arglegalsumm,
title = "{A}rg{L}egal{S}umm: Improving Abstractive Summarization of Legal Documents with Argument Mining",
author = "Elaraby, Mohamed and
Litman, Diane",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics",
url = "https://aclanthology.org/2022.coling-1.540",
pages = "6187--6194",
abstract = "A challenging task when generating summaries of legal documents is the ability to address their argumentative nature. We introduce a simple technique to capture the argumentative structure of legal documents by integrating argument role labeling into the summarization process. Experiments with pretrained language models show that our proposed approach improves performance over strong baselines.",
}