You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bunch of backscrapers use the date_utils.make_date_range_tuples function to create the back_scrape_iterable, which takes a gap value for the size in days of each interval
As of now, we hard code the gap value. However, we could make this a dynamic variable from the caller keyword arguments, with a sensible default in case it is not passed
"""Checks if backscrape start and end arguments have been passed
by caller, and parses them accordingly
:param kwargs: passed when initializing the scraper, may or
may not contain backscrape controlling arguments
:return None
"""
start=kwargs.get("backscrape_start")
end=kwargs.get("backscrape_end")
ifstart:
start=datetime.strptime(start, "%m/%d/%Y")
else:
start=self.first_opinion_date
ifend:
end=datetime.strptime(end, "%m/%d/%Y")
else:
end=datetime.now()
self.back_scrape_iterable=make_date_range_tuples(
start, end, self.days_interval
)
When running backscraper, I have found that the self.days_interval I defined was to big in some scrapers for some time periods, and the backscraper is not getting all documents due to page size. This would be easily solved by a dynamic argument
…rval dynamic
Solves freelawproject#1095
- Update sample_caller to catch `--days-interval` optional keyword argument
- Refactor make_backscrape_iterable that used days_interval as the AbstractSite default; all scrapers that used the same pattern are affected
- Changed default behaviour of make_backscrape_iterable to assume dates are passed in %Y/%m/%d a more sensible format than %m/%d/%Y
- Also, add logger.info calls for the start and end date of download_backwards to all the scrapers that did not have it
A bunch of backscrapers use the
date_utils.make_date_range_tuples
function to create theback_scrape_iterable
, which takes agap
value for the size in days of each intervaljuriscraper/juriscraper/lib/date_utils.py
Lines 123 to 152 in 01b0309
As of now, we hard code the
gap
value. However, we could make this a dynamic variable from the caller keyword arguments, with a sensible default in case it is not passedjuriscraper/juriscraper/opinions/united_states/state/colo.py
Lines 139 to 161 in 01b0309
When running backscraper, I have found that the
self.days_interval
I defined was to big in some scrapers for some time periods, and the backscraper is not getting all documents due to page size. This would be easily solved by a dynamic argumentAlso, we could take this opportunity to refactor the most common case of creating the
back_scrape_iterable
, which takes 2datetime.date
as start and end dates, anddays_interval: int
, and save it as a function to be reused. This same pattern is being used in 14 scrapershttps://github.com/search?q=repo%3Afreelawproject%2Fjuriscraper%20self.back_scrape_iterable%20%3D%20make_date_range_tuples&type=code
The text was updated successfully, but these errors were encountered: