Dallas Deed OCR

Overview

This project processes public records from Dallas to extract mortgage principal amounts using OCR (Optical Character Recognition). It includes scripts for data scraping, image processing, and visualization.

Directory Structure

data/: Contains parcel shapefiles and the main database.
images/: Downloaded images used for OCR.
scripts/: Python scripts for data processing.
results/: Output files including CSVs and videos.
tests/: Test scripts to verify functionality.
audio/: Audio files included in the repository.

Setup

To set up the environment, follow these steps:

Create the conda environment:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate deed_ocr_dallas
```

Usage

Initialize the database:

python scripts/check_db.py

Run the main script:

python scripts/find_principal_dallas.py

Visualize the results:

python scripts/visualize_principal.py

Contributing

Contributions are welcome. Please fork the repository and submit a pull request.

##About This script collects Dallas' parcel-level mortgage data in reverse chronological order. To collect all data since 2020 would require ~$3000 in cloud compute costs or 1 month of compute time on a single computer of the specs below.

License

This project is licensed under the GPLv3+ License - see the LICENSE file for details.

System specs

CPU: x86_64
RAM: 15.46 GB
Storage: 1006.85 GB
OS: Linux 5.15.146.1-microsoft-standard-WSL2

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
deed_ocr_dallas		deed_ocr_dallas
.find_principal_dallas2.py.swp		.find_principal_dallas2.py.swp
.gitignore		.gitignore
.montage_principal.py.swp		.montage_principal.py.swp
.~lock.Parcel.dbf#		.~lock.Parcel.dbf#
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
reorg_this_dir_for_git.py		reorg_this_dir_for_git.py
results?_docTypes=DT&_recordedYears=2020-Present&department=RP&limit=250&offset=0&recordedDateRange=18000101,20240426&searchOcrText=false&searchType=quickSearch&searchValue=deed of trust&sort=desc&sortBy=recordedDate		results?_docTypes=DT&_recordedYears=2020-Present&department=RP&limit=250&offset=0&recordedDateRange=18000101,20240426&searchOcrText=false&searchType=quickSearch&searchValue=deed of trust&sort=desc&sortBy=recordedDate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dallas Deed OCR

Overview

Directory Structure

Setup

Usage

Contributing

License

System specs

About

Releases

Packages

Languages

License

dhardestylewis/texas_mortgage_data

Folders and files

Latest commit

History

Repository files navigation

Dallas Deed OCR

Overview

Directory Structure

Setup

Usage

Contributing

License

System specs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages