Repo for the paper Unsupervised Simplification of Legal Texts https://arxiv.org/pdf/2209.00557
We have gathered a new dataset for the goal of legal text simplification. To that aim, we have selected 1000 random legal sentences from the CaseLaw Access project of Harward Law School. Then, by collaborating with the faculty and the students of Bilkent Law School, we produced 3 different simplified reference files for these 1000 sentences. We hope that this dataset can serve as a benchmark for future legal text simplification studies.
In order to run the algorithm proposed in the paper, run the following command. Python 3.6 or above is required. In particular, run:
conda create -n uslt python=3.10
conda activate uslt
git clone https://github.com/koc-lab/lex-simple.git
cd lex-simple
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m spacy download en
cd scripts
python run_uslt.py
After running the code above, you will generate a .txt file with lexical simplifications. In order to do structural simplification on top of lexical simplification, follow the steps in https://github.com/Lambda-3/DiscourseSimplification/tree/master. In particular, run
cd .. #make sure you are in the main directory
git clone https://github.com/koc-lab/SentenceSplitting.git
cd DiscourseSimplification
mvn clean install -DskipTests
First, create a directory under DiscourseSimplification at edu/stanford/nlp/models/pos-tagger/english-left3words, and move the stanford nlp taggers you may find in this drive link inside these folders: https://drive.google.com/drive/folders/1GQerFiPgzFnS2lawIfAz8C_NsLbdQUJG?usp=share_link Then, generate an empty file called 'input.txt' inside this directory and copy and paste the lexically simplified document generated by the run_uslt.py code. Then, run
mvn clean compile exec:java
cd ..
python decode_sentence_splitting.py
Now you generated the final txt file!
You need to install easse, for which please follow the guides in https://github.com/feralvam/easse
After gathering the text outputs, run
python eval.py