The script does not find the date (Russian) #156

PetroffSky · 2024-07-03T05:48:18Z

The script does not find the date (Russian):
from htmldate import find_date

url = "https://kamaz.ru/press/releases/kamaz_i_skolkovo_sozdadut_ekologicheski_chistyy_gruzovik/"

print(find_date(url, extensive_search=True)) # Returns None
print(find_date(url, extensive_search=False)) # Returns None

Xpath selector of dates on the page: //div[contains(text(), 'July 30, 2015')]

adbar · 2024-07-16T15:14:28Z

Something has to be added to the extractors otherwise the div element will not be processed (e.g. class contains "news" or "detail").

PetroffSky · 2024-07-17T06:57:54Z

Hello! I'm sorry. My mistake. Here is the correct xpath:
//div[contains(text(), '30 Июля 2015')]
also the names of the months in Russian in order in two versions:
months = ['январь', 'февраль', 'март', 'апрель', 'май', 'июнь', 'июль', 'август', 'сентябрь', 'октябрь', 'ноябрь', 'декабрь']
or
months = ['января', 'февраля', 'марта', 'апреля', 'мая', 'июня', 'июля', 'августа', 'сентября', 'октября', 'ноября', 'декабря']

adbar · 2024-07-19T09:19:22Z

I meant that someone need to add a precise XPath target, using //div[contains(text())] or simply //div//text() would be bad for accuracy because random dates in a text are often irrelevant.

As for the months if you're interested you could add them to the extractor in a pull request.

adbar added the bug Something isn't working label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The script does not find the date (Russian) #156

The script does not find the date (Russian) #156

PetroffSky commented Jul 3, 2024

adbar commented Jul 16, 2024

PetroffSky commented Jul 17, 2024 •

edited

Loading

adbar commented Jul 19, 2024

The script does not find the date (Russian) #156

The script does not find the date (Russian) #156

Comments

PetroffSky commented Jul 3, 2024

adbar commented Jul 16, 2024

PetroffSky commented Jul 17, 2024 • edited Loading

adbar commented Jul 19, 2024

PetroffSky commented Jul 17, 2024 •

edited

Loading