Skip to content

omagid-crp/house-pfd-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f5a982d · Sep 30, 2019

History

16 Commits
Sep 30, 2019
Sep 26, 2019
Sep 26, 2019
Sep 30, 2019
Sep 30, 2019
Sep 30, 2019
Sep 30, 2019
Sep 30, 2019
Sep 30, 2019
Sep 30, 2019
Feb 4, 2019

Repository files navigation

This script parses House financial disclosure reports available at http://clerk.house.gov/public_disc/financial-search.aspx. It's a modification of <a href-"https://github.com/PublicI/pfd-parser">this script, created by the Center for Public Integrity, which parses executive branch financial disclosure reports.

It accepts three command line arguments: filePath, outPath, and noContentPath. The filePath is the directory containing PDFs for it to parse. The outPath is the directory where it will save CSVs of the parsed data it produces. The noContentPath is the directory where it will move PDFs with unreadable content. If no arguments are given, filePath defaults to {current_directory}/data/input/, outPath defaults to {current_directory}/data/output/, and noContentPath defaults to {current_directory}/data/no-content/.

About

Parses House Personal Financial Disclosure Reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published