feat(NewOpinionSite): support returning nested data structures #952

grossir · 2024-03-06T22:05:39Z

Related to #883

add jsonschema dependencies
create JSON Schemas for each scraped object, corresponding to courtlistener's Django Models
validate scraped data using JSONSchemaValidator
support nested objects data structures, which allow passing OpinionClusters to caller
nested objects also make it easier to pass previously unscraped objects to CL, like OriginatingCourtInformation
found an easy way to test secondary pages
update tex scraper to the new scraper class (Related to Texas Supreme and Appellate scrapers enhacements #902 )

… dicts - create new base class for OpinionSite - add example for new deferring objects on `nev` - add example of json validation

- add jsonschema dependencies - create JSON Schemas for each scraped object, corresponding to courtlistener's Django Models - validate scraped data using JSONSchemaValidator - support nested objects data structures, which allow passing OpinionClusters to caller - nested objects also make it easier to pass previously unscraped objects to CL, like OriginatingCourtInformation - found an easy way to test secondary pages - update `tex` scraper to the new scraper class

sentry-io · 2024-03-06T22:05:49Z

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: juriscraper/opinions/united_states/state/tex.py

Function	Unhandled Issue
`__init__`	Unretrievable: 'https://courtlistener.com/schemas/Docket.json' cl.scrapers.management.commands.cl_scrape_opin... `Event Count:` 1

_{Did you find this useful? React with a 👍 or 👎}

Also, update test files

Update texapp_1 through texapp_14 classes, and example files

for more information, see https://pre-commit.ci

Add tests for required properties, for types and formats, and for additional properties, to ensure the validator and the schemas work as expected Related to freelawproject#838

Add rfc3339 and rfc3986 dependencies

Deferred values are now collected explicitly, in 2 stages: - getting a deferred download_url - getting other deferred values Since download_url is needed for duplicate checking, it needs to be collected before the checking happens. After this check, we can collect other deferred values. Solves freelawproject#856

Solves: freelawproject#937 Scraper now goes into case page to look for opinion document when it is not linked on the search results page. This is done in a deferred way, also

grossir added 4 commits January 26, 2024 20:26

feat(newOpinionSite): change underlying data structures from lists to…

f95bdfc

… dicts - create new base class for OpinionSite - add example for new deferring objects on `nev` - add example of json validation

wip

b4d8348

Merge branch 'main' into new_opinion_site_subclass

4a55b02

grossir mentioned this pull request Mar 6, 2024

Enhance Juriscraper to Support Bundling of Separate Opinions #883

Open

8 tasks

grossir and others added 6 commits March 8, 2024 18:05

feat(texcrimapp): update scraper

9928f2e

feat(texapp_1, texapp_2): update to new base class

ae96a6b

Also, update test files

feat(texapp): Change to NewOpinionSite class

d512239

Update texapp_1 through texapp_14 classes, and example files

[pre-commit.ci] auto fixes from pre-commit.com hooks

7baf232

for more information, see https://pre-commit.ci

feat(schema_validator): add tests

6e71219

Add tests for required properties, for types and formats, and for additional properties, to ensure the validator and the schemas work as expected Related to freelawproject#838

correct typo: date_judgement should be date_judgment

01425b9

grossir marked this pull request as ready for review March 13, 2024 00:51

grossir closed this Mar 13, 2024

grossir reopened this Mar 13, 2024

feat(schema_validator): update requirements.txt

b99e1c4

Add rfc3339 and rfc3986 dependencies

grossir requested a review from flooie March 13, 2024 00:55

grossir added 2 commits March 18, 2024 23:04

feat(alaska): Update to NewOpinionSite class

c52d88b

Solves: freelawproject#937 Scraper now goes into case page to look for opinion document when it is not linked on the search results page. This is done in a deferred way, also

grossir mentioned this pull request Aug 5, 2024

Ideas to improve Juriscraper #1106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(NewOpinionSite): support returning nested data structures #952

feat(NewOpinionSite): support returning nested data structures #952

grossir commented Mar 6, 2024 •

edited

Loading

sentry-io bot commented Mar 6, 2024

feat(NewOpinionSite): support returning nested data structures #952

Are you sure you want to change the base?

feat(NewOpinionSite): support returning nested data structures #952

Conversation

grossir commented Mar 6, 2024 • edited Loading

sentry-io bot commented Mar 6, 2024

🔍 Existing Issues For Review

grossir commented Mar 6, 2024 •

edited

Loading