HOLJ Plus

This project extracts UK House of Lords judgements from 1996 to 2009: https://publications.parliament.uk/pa/ld/ldjudgmt.htm

HTML files are scraped for the text of the cases and cleaned up for the purposes of annotating the majority judgement. We select 231 of those cases and merge them with the HOLJ corpus to create the 300 cases strong HOLJ+ corpus.

Getting Started

To get the full corpus used in our research, simply run "holjplus.py", this should get you ~750 House of Lords judgements in plain text format - HOLJ+. To get the 300 cases strong corpus we use for majority opinion research we then merge the existing HOLJ corpus with the HOLJ+ corpus using "merge.py".

"merge.py" can also be used to further extend and combine our, or any .txt corpus. See "merge.py" for details.

Prerequisites

Running the tests

To run the build in tests, run format.py, extract.py and scrape.py

Contributing

scrape.py - functions adapted from realpython.com tutorial

Authors

Josef Valvoda

License

This project is licensed under the MIT License - LICENSE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOLJ Plus

Getting Started

Prerequisites

Running the tests

Contributing

Authors

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
HOLJ+.zip		HOLJ+.zip
LICENSE		LICENSE
README.md		README.md
extract.py		extract.py
format.py		format.py
holjplus.py		holjplus.py
merge.py		merge.py
scrape.py		scrape.py

License

valvoda/holjplus

Folders and files

Latest commit

History

Repository files navigation

HOLJ Plus

Getting Started

Prerequisites

Running the tests

Contributing

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages