Skip to content

Dataset and code for the paper "CivilSum: A Dataset for Abstractive Summarization of Court Decisions"

License

Notifications You must be signed in to change notification settings

ra-MANUJ-an/CivilSum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Legal Document Summarization with CivilSum

Code License Data License Python 3.9+

Dataset and code for the paper "CivilSum: A Dataset for Abstractive Summarization of Court Decisions" which appears in SIGIR 2024.

Introduction

We provide both the code base and the dataset used in the research, aiming to facilitate the exploration and evaluation of legal document summarization methods. Our contribution includes the introduction of CivilSum, a comprehensive collection of 23,350 legal case decisions from the Supreme Court of India and other Indian High Courts, paired with human-written summaries. CivilSum stands out not only for its larger volume of legal decisions but also for providing shorter and more abstractive summaries, setting it as a challenging benchmark for legal summarization.

civilsum

Dataset

To download the data, use the following links:

Data split # samples
train.csv 21,015
validation.csv 1,168
test.csv 1,167

Summarization Experiments

The experiments folder contains scripts for reproducing the summarization experiments using Longformer, FactorSum, and Llama-2.

Copyright Information

In accordance with the provisions of the Indian Copyright Act, 1957, it is affirmed that the judicial pronouncements are readily accessible and can be accessed through the website by conducting a search using the name of the specific case. It should be noted that the headnotes or summaries of these judicial pronouncements are protected under the Indian Copyright Act, 1957, with copyright belonging to Copyright © 2016 Patiala Law House.

Furthermore, this dataset's license is restricted to specific purposes such as conducting academic or educational research or study. It should be duly acknowledged that the utilization of the judicial pronouncements from the aforementioned website is carried out within the confines of the license provided, and thus does not infringe upon the provisions set forth by the copyright act.

We release our corpus under the CC BY-NC-SA 4.0 license.

Citation

coming soon

About

Dataset and code for the paper "CivilSum: A Dataset for Abstractive Summarization of Court Decisions"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages