Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Fix parsing short relative date formats (English only) #1068

Merged
merged 2 commits into from
Jun 18, 2023

Conversation

Theta-Dev
Copy link
Contributor

I added support for the new, short date format (e.g. 1wk ago).

Fixes #1067

@Theta-Dev Theta-Dev force-pushed the fix/short-date-pasing branch from 59ba106 to 019a6a9 Compare June 1, 2023 10:45
@TobiGr
Copy link
Contributor

TobiGr commented Jun 6, 2023

LGTM, but what about the other languages? I'd guess we need to update them, too. Should we ask the community for help here?

@Theta-Dev
Copy link
Contributor Author

Parsing YouTube data in other languages is currently disabled and it would require more extensive changes to the dictionary and the parser.

I have implemented (and tested) a parser that works with all languages. Here is the dictionary for it. Also note that there are some special cases that have to be handled seperately (e.g. in French a is both an article and the short form of "years").

https://code.thetadev.de/ThetaDev/rustypipe/src/branch/main/testfiles/dict/dictionary.json

@AudricV AudricV added bug Issue is related to a bug youtube service, https://www.youtube.com/ labels Jun 10, 2023
@AudricV
Copy link
Member

AudricV commented Jun 17, 2023

Parsing YouTube data in other languages is currently disabled

Yes, but the timeago parser can be also used for something else than YouTube by clients, even if that's its main goal.

Is the data you provided extracted from YouTube? Do you wish to do other languages support using your similar approach to us? I think we should have a parser separating date units (seconds, hours, days, ...), digits (1, 2, 3, ...) and number units (tens, hundreds, thousands, ...) for each language we want to add support.

in French a is both an article and the short form of "years"

Did you mean the verb have at the third singular person and the short form of years? There is not a article, but a à one.

@Theta-Dev
Copy link
Contributor Author

Theta-Dev commented Jun 17, 2023

This is the French term in question (5 years ago):

il y a 5 a

I currently have a special case for the French language which checks if the string ends with a.

The data from the parsing dictionary is a combination of data extracted from YouTube and the CLDR repository.

@AudricV AudricV changed the title [YouTube] fix: parsing of short date formats [YouTube] Fix parsing short date formats (English only) Jun 18, 2023
@AudricV
Copy link
Member

AudricV commented Jun 18, 2023

Merging this PR, as we would need to overhaul the timeago parser system to work with other languages on short time units. Thanks for the fix!

@AudricV AudricV changed the title [YouTube] Fix parsing short date formats (English only) [YouTube] Fix parsing short relative date formats (English only) Jun 18, 2023
@AudricV AudricV merged commit ad97f08 into TeamNewPipe:dev Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug youtube service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[YouTube] New short date format
4 participants