Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong guess of movies when the resolution/screen size does not end with 'p' #693

Open
nachocho opened this issue Apr 3, 2021 · 9 comments

Comments

@nachocho
Copy link

nachocho commented Apr 3, 2021

Hi,
Not sure this qualifies as a bug, but I think so.
Subscene has several subtitles where the release-info (which is supposed to come from an actual movie releases) produces a wrong guess.

For example (note the resolution does not end with 'p'):
guessit('Gladiator.EXTENDED.2000.720.BrRip.264.YIFY')
guessit('Gladiator.EXTENDED.2000.1080.BrRip.264.YIFY')

produces
MatchesDict([('title', 'Gladiator'), ('edition', 'Extended'), ('year', 2000), ('season', 7), ('episode', 20), ('source', 'Blu-ray'), ('other', ['Reencoded', 'Rip']), ('release_group', '264.YIFY'), ('type', 'episode')])

and

MatchesDict([('title', 'Gladiator'), ('edition', 'Extended'), ('year', 2000), ('season', 10), ('episode', 80), ('source', 'Blu-ray'), ('other', ['Reencoded', 'Rip']), ('release_group', '264.YIFY'), ('type', 'episode')])

respectively. Which is wrong, for some reason guessit is interpreting the 720 as season 7 episode 20 and the 1080 as season 10 and episode 80. There is no separator in those numbers to start with, that is why I think it is wrong and an assumption like that may break other release names. So this is making the guess type also wrong, these are movies.

Another, tougher to guess, example is:

guessit('Gladiator 23.976 FPS')
guessit('Gladiator 25.000 FPS')

which produces results:

MatchesDict([('title', 'Gladiator'), ('episode', [23, 76]), ('season', 9), ('episode_title', 'FPS'), ('type', 'episode')])

and

MatchesDict([('title', 'Gladiator'), ('episode', [25, 0]), ('season', 0), ('episode_title', 'FPS'), ('type', 'episode')])

respectively. And I know these are tough ones, maybe even invalid titles for guessit, but again, the way it is assuming episodes and season looks odd, how come 23.976 translates into season 9 with episodes 23 and 76 and 25.000 translates into season 0 with episodes 25 and 0? So that also makes it guess these are episodes, not movies.

Here is another example:

guessit('Aliens DVD Silver Box Set 131 Min')

produces

MatchesDict([('title', 'Aliens'), ('source', 'DVD'), ('season', 1), ('episode', 31), ('episode_title', 'Min'), ('type', 'episode')])

again, guessing this is an episode instead of a movie, and treating the number 131 (with no separator whatsoever) as season and episode number.

I am hoping these examples help improving the product (which is great!) if the bug report is accepted.

Thanks

@Toilal
Copy link
Member

Toilal commented Apr 19, 2021

for some reason guessit is interpreting the 720 as season 7 episode 20 and the 1080 as season 10 and episode 80

This is a common pattern for some episode numbering in anime scene, that's why it's guessed as season/episode. I'm not sure I want to fix this one. Same for 131 case, in fact, technicaly and statistically speaking, it's more likely to be an episode than a movie.

For the FPS thing, it's another problem and could be fixed with a new property, what about frame_rate ?

@ratoaq2
Copy link
Member

ratoaq2 commented Apr 19, 2021

frame_rate is a good choice.

When I implemented https://github.com/ratoaq2/knowit I tried to have the names consistent with guessit and I used frame_rate for that

@nachocho
Copy link
Author

for some reason guessit is interpreting the 720 as season 7 episode 20 and the 1080 as season 10 and episode 80

This is a common pattern for some episode numbering in anime scene, that's why it's guessed as season/episode. I'm not sure I want to fix this one. Same for 131 case, in fact, technicaly and statistically speaking, it's more likely to be an episode than a movie.

For the FPS thing, it's another problem and could be fixed with a new property, what about frame_rate ?

Well I would think it is more common to have movies with release information of the form I gave (with a resolution like 'Gladiator.EXTENDED.2000.720.BrRip.264.YIFY') than anime episode titles that use a single number. Honestly merging season and episode in a single number looks plain wrong. But I understand the intention is to support them, and IMO these type of movies which are VERY common should be ideally supported.

It would be nice to have support for the FPS, but a correct guessing of movies with resolution is more important (because it is common) IMO.

Thanks.

@Toilal
Copy link
Member

Toilal commented Apr 19, 2021

I see and understand. This could happen, but with a flag/mode.

@Toilal
Copy link
Member

Toilal commented Apr 29, 2021

@nachocho Does 131 Min stands for the duration of the media ?

@Toilal
Copy link
Member

Toilal commented Apr 29, 2021

In fact, frame_rate already exists, but doesn't support .000 nor fps to be separated with a space. I'll fix it.

@nachocho
Copy link
Author

nachocho commented Apr 29, 2021

@nachocho Does 131 Min stands for the duration of the media ?

Yes, in this example:

guessit('Aliens DVD Silver Box Set 131 Min')

131 Min stands for the duration of 131 minutes. I do know that it is extremely difficult (if not impossible) to account for every single case out there. I would say the 131 Min can be disregarded if needed. The important thing here is to not treat this match as an episode of season 1 episode 31. Of course if you think also guessing the duration of the media is possible, even better.

@Toilal
Copy link
Member

Toilal commented Apr 30, 2021

I can add a pattern to guess this duration so this will not be guessed as season/episode anymore.

Gladiator.EXTENDED.2000.720.BrRip.264.YIFY
Gladiator.EXTENDED.2000.1080.BrRip.264.YIFY

Those case are harder to solve ... Maybe we could add screenSize patterns without p when year is already guessed, but I have to check if it doesn't break other test cases.

@nachocho
Copy link
Author

I can add a pattern to guess this duration so this will not be guessed as season/episode anymore.

Gladiator.EXTENDED.2000.720.BrRip.264.YIFY
Gladiator.EXTENDED.2000.1080.BrRip.264.YIFY

Those case are harder to solve ... Maybe we could add screenSize patterns without p when year is already guessed, but I have to check if it doesn't break other test cases.

I agree the release info looks wrong, and resolution should have a 'p' at the end. On the other hand, resolutions are pretty standard, I would say if I get a 1080, 2160, instead of season and episode it is most likely a resolution and should be treated as such, regardless of where you find it. Season 10 ep 80 or season 21 ep 60 is really unlikely. Season 7 ep 20 could be more common, and maybe if 720 is found, some other things would need to be considered.

This is just a thought, but of course you know better how to handle it.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants