uOttawa at LegalLens-2024: Transformer-based Classification

Project Overview

This repository contains the code and models used for the LegalLens-2024 shared task, which focuses on detecting legal violations within unstructured textual data and associating these violations with potentially affected individuals.

The project is divided into two subtasks:

Subtask A: Legal Named Entity Recognition (L-NER) - Identifying legal violations using Named Entity Recognition (NER) techniques.
Subtask B: Legal Natural Language Inference (L-NLI) - Linking legal violations to potentially affected individuals using Natural Language Inference (NLI).

This repository demonstrates the effectiveness of transformer models such as BERT, RoBERTa, and DeBERTa for legal tasks.

Dataset

The datasets used for this project are provided by the LegalLens-2024 shared task organizers. They contain textual data representing legal violations and related entities. Each subtask uses different subsets of this data:

Subtask A: L-NER dataset includes tokenized text and corresponding named entities (violation, violation by, violation on, law). Explore the dataset.
Subtask B: L-NLI dataset involves sentence pairs (premise and hypothesis) with labels (Entailment, Neutral, Contradict). Explore the dataset.

Data Split

Training Set: Provided by the organizers.
Validation Set: 20% of the training data.
Test Set: Separate from the validation set, used for final evaluation by the organizers.

Model Architectures

Subtask A: Legal Named Entity Recognition (L-NER)

Model: Fine-tuned DeBERTa-v3-base.
Library: Utilized spaCy for tokenization and NER task configuration.
Evaluation: The model achieves an F1-score of 86.37%.

Subtask B: Legal Natural Language Inference (L-NLI)

Model: Combined RoBERTa (ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli) with a custom-built CNN for keyword detection.
Library: Used the Hugging Face Transformers library for tokenization and model training.
Evaluation: Achieved an F1-score of 72.4% on the hidden test set, placing 5th in the competition.

Usage

You can use the code directly in .

Subtask A: Legal Named Entity Recognition (L-NER)

Data Preprocessing: Tokenize the text using spaCy's tokenizer and prepare the dataset for NER training.
Model Training: Fine-tune the DeBERTa model on the L-NER dataset and save the model.
Model Testing: Load the trained model and evaluate it on the test dataset.

Subtask B: Legal Natural Language Inference (L-NLI)

Data Preprocessing: Prepare the NLI dataset by tokenizing premises and hypotheses.
Model Training: Train the combined RoBERTa-CNN model for NLI and save the model.
Model Testing: Load the trained model and evaluate it on the test dataset.

Results

Subtask A: Legal Named Entity Recognition (L-NER)

Best F1-score: 86.37%

Subtask B: Legal Natural Language Inference (L-NLI)

Best F1-score on validation: 88.6%
Final F1-score on hidden test set: 72.4%

Citation

@misc{meghdadi2024uottawalegallens2024transformerbasedclassification,
      title={uOttawa at LegalLens-2024: Transformer-based Classification Experiments}, 
      author={Nima Meghdadi and Diana Inkpen},
      year={2024},
      eprint={2410.21139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.21139}, 
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

uOttawa at LegalLens-2024: Transformer-based Classification

Project Overview

Table of Contents

Dataset

Data Split

Model Architectures

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Usage

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Results

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

uOttawa at LegalLens-2024: Transformer-based Classification

Project Overview

Table of Contents

Dataset

Data Split

Model Architectures

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Usage

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Results

Subtask A: Legal Named Entity Recognition (L-NER)

Subtask B: Legal Natural Language Inference (L-NLI)

Citation

License