Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain removal with extra flexibiliity for character "gaps" and no filters #22

Open
DanAykroyd256 opened this issue Jan 29, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request priotiry: low Low priotity

Comments

@DanAykroyd256
Copy link

DanAykroyd256 commented Jan 29, 2023

Hi @DrKain, thanks first of all for your great tool! I'm using it with Bazarr and it works great!

I recently found out about the --nochains command, and I had one subtitle with a chain to try it. Although it removed part of the chain, it cut off because the "animation" they try to pull off, changed two characters at one; from line to line, so the tool didn't take it as a chain. I think it's a similar example of the one cited at the original request for the removal of chains, where the OP mistyped the example and the removal didn't completely work.

This is the chain I have. Using the "mobile" text as an ad to remove, I got the result of "[Match] Chain found at 9-24 (mobile - +919815899536)". As you can see, line 8 to 9 made a change of more than one character; hence why the chain wasn't completely removed.

In any case, there wouldn't have been a way for me to catch this if it wasn't manually, because there is not a clear ad word to filter. So, my questions are:

  • Would it be possible to do a removal of ANY chain; as was also suggested in the original request, so any crazy chain like this could be detected; without any match from the filters? Of course, you could add an extra parameter to the call of the command for that
  • Would it be possible to have kind of a "threshold", to be able to continue the chain even if there is a difference of more than one character between the lines? Somewhere we could specify like a kind of gap that we would be willing to accept, in order to completely remove chains like these ones.

Thanks again and have a great day!

Example Subtitles
1
00:00:02,340 --> 00:00:02,540
©

2
00:00:02,540 --> 00:00:02,740
©

3
00:00:02,740 --> 00:00:02,940
© P

4
00:00:02,940 --> 00:00:03,140
© P@

5
00:00:03,140 --> 00:00:03,340
© P@r

6
00:00:03,340 --> 00:00:03,540
© P@rM

7
00:00:03,540 --> 00:00:03,740
© P@rM!

8
00:00:03,740 --> 00:00:03,940
© P@rM!N

9
00:00:03,940 --> 00:00:04,140
© P@rM! Nd

10
00:00:04,140 --> 00:00:04,340
© P@rM! Nde

11
00:00:04,340 --> 00:00:04,540
© P@rM! NdeR

12
00:00:04,540 --> 00:00:04,740
© P@rM! NdeR

13
00:00:04,740 --> 00:00:04,940
© P@rM! NdeR M

14
00:00:04,940 --> 00:00:05,140
© P@rM! NdeR M@

15
00:00:05,140 --> 00:00:05,340
© P@rM! NdeR M@n

16
00:00:05,340 --> 00:00:05,540
© P@rM! NdeR M@nk

17
00:00:05,540 --> 00:00:05,740
© P@rM! NdeR M@nkÖ

18
00:00:05,740 --> 00:00:05,940
© P@rM! NdeR M@nkÖÖ

19
00:00:05,940 --> 00:00:06,140
© P@rM! NdeR M@nkÖÖ

20
00:00:06,140 --> 00:00:07,340
© P@rM! NdeR M@nkÖÖ ™

21
00:00:07,540 --> 00:00:08,340
© P@rM! NdeR M@nkÖÖ ™

22
00:00:08,540 --> 00:00:09,340
© P@rM! NdeR M@nkÖÖ ™

23
00:00:09,540 --> 00:00:10,340
© P@rM! NdeR M@nkÖÖ ™

24
00:00:10,340 --> 00:00:11,340
© P@rM! NdeR M@nkÖÖ ™
Mobile - +919815899536
@DanAykroyd256 DanAykroyd256 added enhancement New feature or request priotiry: low Low priotity labels Jan 29, 2023
@DanAykroyd256 DanAykroyd256 changed the title Chain Removal with some flexibiliity Chain removal with extra flexibiliity for character "gaps" and no filters Jan 29, 2023
DrKain added a commit that referenced this issue Jan 29, 2023
Related to issue #22
@DrKain
Copy link
Owner

DrKain commented Jan 29, 2023

Hi, thanks for the feedback and detailed issue.

Would it be possible to do a removal of ANY chain; as was also suggested in the original request, so any crazy chain like this could be detected; without any match from the filters?

This is definitely possible and a good suggestion, I'm currently away from home and will not be able to add this anytime soon, but I'll leave this issue open in case someone wants to take a shot at it while I'm away. If not, I'll work on adding this when I'm available next.
It's worth noting that this would risk incorrect cleans when a line is repeated by one or more people during a scene.

Would it be possible to have kind of a "threshold", to be able to continue the chain even if there is a difference of more than one character between the lines?

Also yes, but in the case of nodes 8-9 there's 3 characters difference, a fuzzy match would also risk breaking valid lines. I try to keep the rules as strict as possible to avoid removing valid subtitles, so I'll need to look more into this.

For now the nodes you have can be cleaned simply with: /^©/
This will look for lines starting with © and remove them, I've added this to the main filters so you can run subclean --update to fetch them.

@DrKain
Copy link
Owner

DrKain commented Jan 29, 2023

Linking this to #20 and #4 as they are related

@DanAykroyd256
Copy link
Author

Thanks for your quick reply @DrKain and your consideration for improving this! I agree that handling all these edge cases might get crazy :)

I’ll leave you here the .srt of my example, for if you want to use it when you look into this in the future.

Antiviral (2012).en.srt.txt

Have a great week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priotiry: low Low priotity
Projects
None yet
Development

No branches or pull requests

2 participants