Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NewOpinionSite): support returning nested data structures #952

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Mar 6, 2024

Related to #883

  • add jsonschema dependencies
  • create JSON Schemas for each scraped object, corresponding to courtlistener's Django Models
  • validate scraped data using JSONSchemaValidator
  • support nested objects data structures, which allow passing OpinionClusters to caller
  • nested objects also make it easier to pass previously unscraped objects to CL, like OriginatingCourtInformation
  • found an easy way to test secondary pages
  • update tex scraper to the new scraper class (Related to Texas Supreme and Appellate scrapers enhacements #902 )

… dicts

- create new base class for OpinionSite
- add example for new deferring objects on `nev`
- add example of json validation
- add jsonschema dependencies
- create JSON Schemas for each scraped object, corresponding to courtlistener's Django Models
- validate scraped data using JSONSchemaValidator
- support nested objects data structures, which allow passing OpinionClusters to caller
- nested objects also make it easier to pass previously unscraped objects to CL, like OriginatingCourtInformation
- found an easy way to test secondary pages
- update `tex` scraper to the new scraper class
Copy link

sentry-io bot commented Mar 6, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: juriscraper/opinions/united_states/state/tex.py

Function Unhandled Issue
__init__ Unretrievable: 'https://courtlistener.com/schemas/Docket.json' cl.scrapers.management.commands.cl_scrape_opin...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

grossir and others added 6 commits March 8, 2024 18:05
Update texapp_1 through texapp_14 classes, and example files
Add tests for required properties, for types and formats, and for additional properties, to ensure the validator and the schemas work as expected

Related to freelawproject#838
@grossir grossir marked this pull request as ready for review March 13, 2024 00:51
@grossir grossir closed this Mar 13, 2024
@grossir grossir reopened this Mar 13, 2024
Add rfc3339 and rfc3986 dependencies
@grossir grossir requested a review from flooie March 13, 2024 00:55
Deferred values are now collected explicitly, in 2 stages:
- getting a deferred download_url
- getting other deferred values

Since download_url is needed for duplicate checking, it needs to be collected before the checking happens.
After this check, we can collect other deferred values.

Solves freelawproject#856
Solves: freelawproject#937

Scraper now goes into case page to look for opinion document when it is not linked on the search results page.
This is done in a deferred way, also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant