Scrape lobbyist employer expenditures #29

hancush · 2024-09-10T19:51:14Z

The current lobbyist scrape captures employer name (ClientName), but there is some additional metadata we could capture: https://docs.google.com/spreadsheets/d/1-c2Ony5hGjpchOfwPKhYyoJ9FPji0dfkgt0mTcBS9mo/edit?usp=sharing

Clarify with Marjorie which employer metadata lobbyist scrapes should include, then capture it.

hancush · 2024-09-11T13:35:21Z

@antidipyramid I'll email Marjorie to see what, if any, additional data about the employers she wants in the scrapes.

hancush · 2024-09-11T17:07:56Z

🚨 Glad I asked! We need to scrape the lobbyist employer expenditures from https://login.cfis.sos.state.nm.us/#/lobbyistexpendituresearch/31.

antidipyramid · 2024-09-17T14:35:07Z

@hancush There is a good amount of data processing going on in lobbyists.mk.

Do we want to do some kind of processing on the employer expenditures?

hancush · 2024-09-17T14:53:16Z

Great question, @antidipyramid. Some context on lobbyist (and lobbyist employer) scraping: The search interface does not include one very important piece of information: the beneficiary of the expenditure / contribution. So, the original lobbyist scrape downloads all of a lobbyist's filings, then parses information out of those PDFs.

It looks like lobbyist employers file the same information in the same format, e.g., https://login.cfis.sos.state.nm.us//ReportsOutput//LAR/4a27c051-7b49-456a-9936-98d595384a08.pdf

I wonder if we could simply plug them into the existing pipeline (perhaps with some modifications, since rather than a lobbyist associated with a client [employer], there will only be clients [employers])?

hancush · 2024-09-17T14:57:55Z

Looks like there's an https://login.cfis.sos.state.nm.us/api//ExploreClients/Disclosures endpoint that gets filings for lobbyist employers (while it's https://login.cfis.sos.state.nm.us/api//ExploreClients/Fillings for lobbyists) – see the network request when you click on "Filings" here: https://login.cfis.sos.state.nm.us/#/exploreClientDetailPublic/mDJ2oXreU_grMhUIIWBeHHwquY7yN_7SNrmbDh6rMxI1/10/2024

If you can modify the script that retrieves filings so it works for both lobbyists and lobbyist employers, I think you can use the rest of the pipeline (PDF parsing) as is, or close to it! What do you think?

hancush self-assigned this Sep 11, 2024

hancush changed the title ~~Scrape lobbyist employer~~ Scrape lobbyist employer expenditures Sep 11, 2024

This was referenced Sep 20, 2024

Lobbyist employer scraper #31

Merged

Upload excel file of independent expenditures to S3 #32

Merged

antidipyramid closed this as completed in #31 Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape lobbyist employer expenditures #29

Scrape lobbyist employer expenditures #29

hancush commented Sep 10, 2024 •

edited

Loading

hancush commented Sep 11, 2024

hancush commented Sep 11, 2024

antidipyramid commented Sep 17, 2024

hancush commented Sep 17, 2024

hancush commented Sep 17, 2024

Scrape lobbyist employer expenditures #29

Scrape lobbyist employer expenditures #29

Comments

hancush commented Sep 10, 2024 • edited Loading

hancush commented Sep 11, 2024

hancush commented Sep 11, 2024

antidipyramid commented Sep 17, 2024

hancush commented Sep 17, 2024

hancush commented Sep 17, 2024

hancush commented Sep 10, 2024 •

edited

Loading