-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find_date
doesn't extract %D %b %Y
formatted dates in free text
#67
Comments
find_date
returns the wrong date if date is formatted %D %b %Y
find_date
doesn't extract %D %b %Y
formatted dates in free text
Hi @k-sareen, thanks for your feedback. It's not a bug since |
Note about a quick fix: the issue can be resolved as follows, but the code gets slower and would have to be tested carefully. It can lead to false positives by extracting any date mentioned in the text without disambiguation or further clue about its relevance: Changes in
|
Ah right. I apologize, I seem to have misunderstood what kinds of dates |
No problem, I could add this functionality to the library but I need some time to test it. Just out of curiosity: Which languages are you interested in? |
I'm working with English text/articles only. Though I think you're right that this is a bit of a slippery slope as it may potentially catch dates that are mentioned in the prose but are not the actual article date. I think it might be best to keep your library simple and I'll try and get around this edge case myself. Thank you again for your great work and for your insight! P.S. Should I close the issue? |
Thanks for your feedback, you can leave the issue open, I'll think about it and close it if it goes beyond the scope of the library. |
Full text search is now supported and your example above works. |
For the following MWE:
htmldate
outputs2022-01-01
instead of the expected2022-10-19
.I've traced the execution of the above call and I believe it is the
search_page
function that has the bug. It doesn't seem to catch the above date pattern as a valid date and only grabs onto the2022
part of the date string (which autocompletes the rest to 1st Jan).I haven't found time to understand why the bug happens in detail so I don't have a solution right now. I'll try and see if I can fix the bug and will make a PR if I can.
The text was updated successfully, but these errors were encountered: