Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Down Scraper Fla 6th District Ct of Appeals #870

Closed
sentry-io bot opened this issue Jan 22, 2024 · 2 comments
Closed

Down Scraper Fla 6th District Ct of Appeals #870

sentry-io bot opened this issue Jan 22, 2024 · 2 comments
Assignees

Comments

@sentry-io
Copy link

sentry-io bot commented Jan 22, 2024

https://6dca.flcourts.gov/Opinions/Most-Recent-Written-Opinions?sort=opinion/case_number%20asc&type=written&view=embed_custom&searchtype=opinions&recent_only=1&hide_search=1&hide_filters=1&limit=30&offset=0

Sentry Issue: COURTLISTENER-64S

HTTPError: 404 Client Error: Not Found for url: https://6dca.flcourts.gov/search?sort=opinion/disposition_date%20desc,%20opinion/case_number%20asc&view=full&searchtype=opinions&limit=10&scopes%5B%5D=sixth_district_court_of_appeal&type%5B%5D=pca&type%5B%5D=written&startdate=01/15/2024&enddate=01/22/2024&date%5Byear%5D=&date%5Bmonth%5D=&date%5Bday%5D=&query=&offset=10
(2 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 385, in handle
    self.parse_and_scrape_site(mod, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 348, in parse_and_scrape_site
    site = mod.Site().parse()
@grossir
Copy link
Contributor

grossir commented Jan 31, 2024

Related issues:

The pagination URL query string has been updated, but since it is long, the change is complicated to see.

All Sentry events have the offset=10 parameter, which point to them happening on the pagination.
However, some of these do load manually, others don't

"Old" url when paginating looks like this (this format still works when getting the first page, though)

{'https://6dca.flcourts.gov/search?sort': ['opinion/disposition_date desc, opinion/case_number asc'],
 'view': ['full'],
 'searchtype': ['opinions'],
 'limit': ['10'],
 'scopes[]': ['sixth_district_court_of_appeal'],
 'type[]': ['pca', 'written'],
 'startdate': ['01/13/2024'],
 'enddate': ['01/20/2024'],
 'offset': ['10']}

New URL:

{'offset': ['40'],
 'view': ['full'],
 'startDate': ['01/16/2024'],
 'endDate': ['01/31/2024'],
 'searchType': ['opinions'],
 'scopes[0]': ['sixth_district_court_of_appeal'],
 'limit': ['10'],
 'sort': ['opinion/disposition_date desc, opinion/case_number asc'],
 'recentOnly': ['0'],
 'types[0]': ['pca'],
 'types[1]': ['written'],
 'activeOnly': ['0'],
 'active_only': ['0'],
 'nonActiveOnly': ['0'],
 'nonactive_only': ['0'],
 'show_scopes': ['0'],
 'hide_search': ['0'],
 'hide_filters': ['0'],
 'siteaccess': ['6dca'],
 'type[0]': ['pca'],
 'type[1]': ['written']}

Example of non working URL (sentry event)

@grossir
Copy link
Contributor

grossir commented Feb 1, 2024

In the end the problem was not that the URL format had changed. The old format still works.

The problem was this block:

        if not self.html.xpath('.//li[@class="next disabled"]'):
            # If pagination is enable, scrape next page
            self.offset = self.offset + 10
            self.update_url()
            self.html = super()._download()
            self._process_html()

The pagination button Next does not appear if there are less than 10 (limit in the query) records , so whenever a lookup had less than 10 results, the button was inexistant and the condition would still be true, leading to paginating to an inexistent page.

I not changing the URL to the new format, but let's keep it in mind in case the old format is retired

Unrelated, but worth to note that PCA and PCD opinions in this source mean "Per Curiam" which is a field we can capture in our Opinion model . Another instance of needing a more flexible return type #883

grossir added a commit to grossir/juriscraper that referenced this issue Feb 1, 2024
Solves freelawproject#870

Pagination test was true when no pagination menu was available, leading to paginating to inexistent pages
@grossir grossir closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant