Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtitle score is not correlating with matching results #821

Closed
ronaldheft opened this issue Feb 14, 2020 · 17 comments
Closed

Subtitle score is not correlating with matching results #821

ronaldheft opened this issue Feb 14, 2020 · 17 comments

Comments

@ronaldheft
Copy link

Describe the bug

I’ve been having an ongoing issue where the subtitle selected by Bazarr is not the ideal subtitle. Often it grabs a subtitle that matches a different release, even when the exact matching release is available.

This often occurs when multiple subtitles show a 100% score. It appears like Bazarr isn’t attempting to match all of the metadata fields. For example, sometimes the scenename is available to match on the release_group, and that doesn’t appear to be used.

I’m documented this all below with screenshots.

To Reproduce

  1. Manually search for a subtitle, where the scenename is available.
  2. Notice multiple results with a 100% score.
  3. Notice the release_group is not being used to match, as the release_group is showing a false match.
  4. When pulling the subtitles automatically, the incorrect result is chosen, as multiple results have a 100% score rating.

Expected behavior

All available metadata is used to calculate the score.

Screenshots
2AB73897-9F97-4590-AFA4-FD989995C2D1
6666A403-AE25-4BBA-88DA-88AB0F75C487
F38B142E-FBCF-4BB2-B767-BD21E80F0F07
62D65854-E140-4EB3-9BE5-7F92B3F65068

Software (please complete the following information):
Bazarr Version: 0.8.4.1
Sonarr Version: 2.0.0.5338
Radarr Version: 0.2.0.1450
Operating System: Linux-4.4.59+-x86_64-with (Docker)

@morpheus65535
Copy link
Owner

If hash match, we don't look for other criteria (except hearing impaired). That's the expected behavior and it's the same with Sub-Zero (we share base code).

@ronaldheft
Copy link
Author

So is that bad data on the subtitle provider? Some of these subtitles definitely do not match the release and are out of sync. Selecting the version with the correct release group returns subtitles in sync.

@morpheus65535
Copy link
Owner

Unfortunately some subtitles uploader are adding hash even if it doesn't match. We have no control over this.

@ronaldheft
Copy link
Author

That’s understandable. Could logic be added if multiple results return a matching hash, that addition metadata fields are used instead of selecting the first subtitle result?

@ronaldheft
Copy link
Author

Essentially calculate the score again on the subset of results matching the hash, but ignoring the hash and calculating off metadata only?

@morpheus65535
Copy link
Owner

@pannal something that could be done?

@GermanG
Copy link
Contributor

GermanG commented Feb 15, 2020

@morpheus65535 Didn't look at the actual code, but looks like the other matches impact sorting, I'll play with bsplayer (which also has hash matching) and I'll let you know.
EDIT: I can reproduce it with bsplayer.

@GermanG
Copy link
Contributor

GermanG commented Feb 15, 2020

I've stealed taken inspiration from subdivx for matching, and modified subliminal_patch.score with:

--- a/libs/subliminal_patch/score.py
+++ b/libs/subliminal_patch/score.py
@@ -81,7 +81,7 @@ def compute_score(matches, subtitle, video, hearing_impaired=None):
                     matches -= {"hash"}
     elif 'hash' in matches:
         logger.debug('%r: Hash not verifiable for this provider. Keeping it', subtitle)
-        matches &= {'hash'}
+        matches |= {'hash'}
 
     # handle equivalent matches
     if is_episode:

Now I have the right preference, but with crazy scores:
image

@GermanG
Copy link
Contributor

GermanG commented Feb 15, 2020

Yup, confusing.
Well, there are many alternatives:

  • Take the current approach, and this is a known bug.
  • Use my brutal aproach and deal with >100% scoring when hashes are matched (as a known new bug)
  • Same as previous but 'Disguise' the >100% as 100% in the UI
  • Return hash matching as an attribute and not part of the scoring, then ordering by (hash, score) descending.

EDIT: @morpheus65535 I'll leave the decision up to you, let me know if it's not the first one, so I can give it a try coding it.

@morpheus65535
Copy link
Owner

What about making hash optional? Something like use scenename?

@pannal
Copy link
Collaborator

pannal commented Feb 15, 2020

Wait, there is already code in place to counter this, because OpenSubtitles had the same issue YEARS ago: https://github.com/pannal/Sub-Zero.bundle/blob/master/Contents/Libraries/Shared/subliminal_patch/score.py#L60

If the provider has the necessary metadata to support hash checking ("series", "season", "episode", "format" for TV, "video_codec", "format" for movies), just enable the hash_verifiable flag for that provider and the subtitle class, and this gets fixed automatically.

@GermanG
Copy link
Contributor

GermanG commented Feb 15, 2020

@pannal that might be the case for bsplayer, but the OP is about OpenSubtitles, and it looks like {"series", "season", "episode", "format"} matches but won't pick the desired subtitle.

@pannal
Copy link
Collaborator

pannal commented Feb 15, 2020

That's something to look into, then.
The scoring might not be ideal for such cases. Maybe we should ultimately revise it, but that's not an easy feat.

Edit: Well, when two subtitles have the same score, Bazarr could prioritize the one that matches the most metadata, which would be quite simple.

@GermanG
Copy link
Contributor

GermanG commented Feb 15, 2020

@pannal but it's dropped when

matches &= {'hash'}

EDIT: ignore this comment, I think I got what you mean.

pannal added a commit to pannal/Sub-Zero.bundle that referenced this issue Feb 16, 2020
@pannal
Copy link
Collaborator

pannal commented Feb 16, 2020

I've added a secondary scoring method to latest bazarr development, that changes the sorting of subtitles based on (score_with_hash, score_without_hash). This might fix the issue.

@ronaldheft
Copy link
Author

Just pulled down the latest development release, and my results are way better! I'm now seeing the correct subtitle selected if there is an exact match.

Screen Shot 2020-02-16 at 2 34 55 PM

I like the approach of doing a secondary sort and keeping the UI at 100% score. If you're considering a hash match a 100% match, then yeah, it makes sense to keep the score at 100% and then from there just pick the best of the bunch.

Thanks for the quick resolution!

@rigas40
Copy link

rigas40 commented Feb 23, 2020

also we can add subsync if have low score will help
or subsync for check if subs are good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants