Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape lobbyist employer expenditures #29

Closed
hancush opened this issue Sep 10, 2024 · 5 comments · Fixed by #31
Closed

Scrape lobbyist employer expenditures #29

hancush opened this issue Sep 10, 2024 · 5 comments · Fixed by #31
Assignees

Comments

@hancush
Copy link
Member

hancush commented Sep 10, 2024

The current lobbyist scrape captures employer name (ClientName), but there is some additional metadata we could capture: https://docs.google.com/spreadsheets/d/1-c2Ony5hGjpchOfwPKhYyoJ9FPji0dfkgt0mTcBS9mo/edit?usp=sharing

Clarify with Marjorie which employer metadata lobbyist scrapes should include, then capture it.

@hancush
Copy link
Member Author

hancush commented Sep 11, 2024

@antidipyramid I'll email Marjorie to see what, if any, additional data about the employers she wants in the scrapes.

@hancush hancush self-assigned this Sep 11, 2024
@hancush hancush changed the title Scrape lobbyist employer Scrape lobbyist employer expenditures Sep 11, 2024
@hancush
Copy link
Member Author

hancush commented Sep 11, 2024

🚨 Glad I asked! We need to scrape the lobbyist employer expenditures from https://login.cfis.sos.state.nm.us/#/lobbyistexpendituresearch/31.

@antidipyramid
Copy link
Contributor

@hancush There is a good amount of data processing going on in lobbyists.mk.

Do we want to do some kind of processing on the employer expenditures?

@hancush
Copy link
Member Author

hancush commented Sep 17, 2024

Great question, @antidipyramid. Some context on lobbyist (and lobbyist employer) scraping: The search interface does not include one very important piece of information: the beneficiary of the expenditure / contribution. So, the original lobbyist scrape downloads all of a lobbyist's filings, then parses information out of those PDFs.

It looks like lobbyist employers file the same information in the same format, e.g., https://login.cfis.sos.state.nm.us//ReportsOutput//LAR/4a27c051-7b49-456a-9936-98d595384a08.pdf

I wonder if we could simply plug them into the existing pipeline (perhaps with some modifications, since rather than a lobbyist associated with a client [employer], there will only be clients [employers])?

@hancush
Copy link
Member Author

hancush commented Sep 17, 2024

Looks like there's an https://login.cfis.sos.state.nm.us/api//ExploreClients/Disclosures endpoint that gets filings for lobbyist employers (while it's https://login.cfis.sos.state.nm.us/api//ExploreClients/Fillings for lobbyists) – see the network request when you click on "Filings" here: https://login.cfis.sos.state.nm.us/#/exploreClientDetailPublic/mDJ2oXreU_grMhUIIWBeHHwquY7yN_7SNrmbDh6rMxI1/10/2024

If you can modify the script that retrieves filings so it works for both lobbyists and lobbyist employers, I think you can use the rest of the pipeline (PDF parsing) as is, or close to it! What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants