Skip to content

Commit

Permalink
fix McKinsey crossword downloader (#114)
Browse files Browse the repository at this point in the history
* fix bug in html parsing of McKinsey downloader
* fix date representation for leading-zero days and non-english locales

---------

Co-authored-by: Parker Higgins <parker@parkerhiggins.net>
  • Loading branch information
iNtEgraIR2021 and thisisparker authored Aug 21, 2023
1 parent b7d1cc5 commit 9a42345
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions xword_dl/downloader/mckinseydownloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,12 @@ def matches_url(url_components):
def find_by_date(self, dt):
"""
date format: month-day-year (e.g. november-15-2022)
no leading zeros on dates (so: e.g., august-1-2023)
crosswords are published every tuesday (as of november 2022)
"""
url_format = str(dt.strftime('%B-%d-%Y')).lower()
month_names = ['january','february','march','april','may','june','july',
'august','septermber','october','november','december']
url_format = f'{month_names[dt.month-1]}-{dt.day}-{dt.year}'
guessed_url = urllib.parse.urljoin(
'https://www.mckinsey.com/featured-insights/the-mckinsey-crossword/',
url_format)
Expand All @@ -40,8 +43,8 @@ def find_latest(self):
index_res = requests.get(index_url)
index_soup = BeautifulSoup(index_res.text, "html.parser")

latest_fragment = next(a for a in index_soup.select('a.item-title-link[href^="/featured-insights/the-mckinsey-crossword/"]')
if a.find('h3'))['href']
latest_fragment = next(a for a in index_soup.select('a.mdc-c-link-heading[href^="/featured-insights/the-mckinsey-crossword/"]')
if a.find('div'))['href']
latest_absolute = urllib.parse.urljoin('https://www.mckinsey.com',
latest_fragment)

Expand Down

0 comments on commit 9a42345

Please sign in to comment.