Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The script does not find the date (Russian) #156

Open
PetroffSky opened this issue Jul 3, 2024 · 3 comments
Open

The script does not find the date (Russian) #156

PetroffSky opened this issue Jul 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@PetroffSky
Copy link

The script does not find the date (Russian):
from htmldate import find_date

url = "https://kamaz.ru/press/releases/kamaz_i_skolkovo_sozdadut_ekologicheski_chistyy_gruzovik/"

print(find_date(url, extensive_search=True)) # Returns None
print(find_date(url, extensive_search=False)) # Returns None

Xpath selector of dates on the page: //div[contains(text(), 'July 30, 2015')]

@adbar adbar added the bug Something isn't working label Jul 16, 2024
@adbar
Copy link
Owner

adbar commented Jul 16, 2024

Something has to be added to the extractors otherwise the div element will not be processed (e.g. class contains "news" or "detail").

@PetroffSky
Copy link
Author

PetroffSky commented Jul 17, 2024

Hello! I'm sorry. My mistake. Here is the correct xpath:
//div[contains(text(), '30 Июля 2015')]
also the names of the months in Russian in order in two versions:
months = ['январь', 'февраль', 'март', 'апрель', 'май', 'июнь', 'июль', 'август', 'сентябрь', 'октябрь', 'ноябрь', 'декабрь']
or
months = ['января', 'февраля', 'марта', 'апреля', 'мая', 'июня', 'июля', 'августа', 'сентября', 'октября', 'ноября', 'декабря']

@adbar
Copy link
Owner

adbar commented Jul 19, 2024

I meant that someone need to add a precise XPath target, using //div[contains(text())] or simply //div//text() would be bad for accuracy because random dates in a text are often irrelevant.

As for the months if you're interested you could add them to the extractor in a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants