Mapping Prejudice: Racial Term Identification from Historical Deed Documents

This repository contains the Mapping Prejudice project pipeline for identifying racially restrictive language in historical deed documents. It provides two core features:

Entity Identification: Uses ModernBERT to recognize and extract racially restrictive terms and phrases based on their context within the document
Document Classification: Uses BERT-base with TARS framework to determine whether any sentence in the document contains racially restrictive language.

Installation:

With `pip`

Install the required dependencies:

pip install -r requirements.txt

With Poetry (Recommended)

Poetry is recommended for better dependency and environment management. To install Poetry and install dependencies:

pip install -r poet.txt
poetry install --no-root

Usage:

The pipeline supports two types of input: a JSON-formatted string or local directory paths. File input from S3 buckets is also supported via boto3.

Entity Identification

This module extracts racially restrictive terms from deed text using contextual entity recognition.

JSON Input

# pip
python3 run_identification.py --json_input <input_json>

# Poetry
poetry run python3 run_identification.py --json_input <input_json>

Local Data

# pip
python3 run_identification.py --local --path_data <path_to_local_data> --output_dir <output_directory> --output_file <output_file_name>

# Poetry
poetry run python3 run_identification.py --local --path_data <path_to_local_data> --output_dir <output_directory> --output_file <output_file_name>

Document Classification

This module classifies whether any sentence in a document contains racially restrictive language.

JSON Input

# pip
python3 run_classification.py --json_input <input_json>

# Poetry
poetry run python3 run_classification.py --json_input <input_json>

Local Data

# pip
python3 run_classification.py --local --path_data <path_to_local_data> --output_dir <output_directory> --output_file <output_file_name>

# Poetry
poetry run python3 run_classification.py --local --path_data <path_to_local_data> --output_dir <output_directory> --output_file <output_file_name>

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
mpterm		mpterm
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
integration.py		integration.py
poet.txt		poet.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_classification.py		run_classification.py
run_identification.py		run_identification.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mapping Prejudice: Racial Term Identification from Historical Deed Documents

Installation:

With `pip`

With Poetry (Recommended)

Usage:

Entity Identification

JSON Input

Local Data

Document Classification

JSON Input

Local Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

knowledge-computing/MP-term-analyses

Folders and files

Latest commit

History

Repository files navigation

Mapping Prejudice: Racial Term Identification from Historical Deed Documents

Installation:

With pip

With Poetry (Recommended)

Usage:

Entity Identification

JSON Input

Local Data

Document Classification

JSON Input

Local Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

With `pip`

Packages