-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add State Attorney General Opinion Scrapers #168
Comments
Maybe this should be a living ticket where we update the list of new scrapers to add. If so, maybe we could rename this issue, or create a new issue to handle the living list. Do you think we should create separate child issues for each new scraper we want to add, so we can close those when done? It would be helpful when adding to our living list to indicate:
Maybe it could be a 3 column table that we add new desired scrapers to with the info above. If this sounds reasonable, could you update this issue (or create new issue for living list) to include this new format/info for your Maryland example above? And if we want this to be a living issue with links to sub issues, could you also add the new scrapers from #167 to the list here with the pertinent 1-3 info mentioned above? |
If you're new here and can help, please say which scraper you're able to work on, and check out the readme to get started. |
dear lord! Is there any strategy here, or just start working from the top? I'll take care of the others in #167 first so we can close that ticket.
|
The right way to do it is to start with the most populous states. The fun
way is to find easy ones with big archives that are easy to traverse.
On Fri, Jan 20, 2017, 21:21 Philip Ardery ***@***.***> wrote:
dear lord! Is there any strategy here, or just start working from the top?
I'll take care of the others in #167
<#167> first so we
can close that ticket.
opinions/united_states/federal_special/ag.py
opinions/united_states/state_special/mdag.py
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#168 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOdqkV4UHrRG7GbmzF7goHrjiJHyVrgks5rUZW-gaJpZM4LTKNz>
.
--
Mike Lissner
Executive Director
Free Law Project
https://free.law
|
…cludes a backscraper that should be run after deployment. Related to #168
…is included and should be run after deployment. It will capture 533 cases and take only about 1 second to run. Relates to #168
…f HTML variation across the history of opinions, so I've added multiple example files for coverage. A backscraper is included and should be run after deployment, taking around 2 minutes and yeilding 18,377 cases. Relates to #168
I don't know if this is useful or not, but the Alabama AG has opinions going back to 1979. They're numbered differently depending on period:
And these are the correct ranges from 1994-2000: 9400001 - 9400267 The PDFs are named according to the above scheme. So, examples from the three different date formats: https://www.alabamaag.gov/Documents/opin/9400001.pdf |
That's great. One day, perhaps, we'll get on this, but absent a volunteer picking it up, it's outside of our budget to do this work for the moment. |
I'm just learning Python, so maybe if I ever get to a place where I understand *args and **kwargs, I can help. But at least the information's there now. :) |
Notes on AGs Missing
I suppose the google drives could maybe be tackled with selenium but I wasn't up for figuring that out. Rhode Island is a mystery because I do think they exists but I dont know where. Future NotesThis was the bare minimum - and didnt set up for back scraping opinions. |
I also moved all the AG scrapers into a new folder And moved all the previously added ones into that directory |
Two top level tasks here:
Trawl the internet and find all the available sources.
Make the scrapers.
I'll develop a list below of all scrapers we want to build.
The text was updated successfully, but these errors were encountered: