Chain removal with extra flexibiliity for character "gaps" and no filters #22

DanAykroyd256 · 2023-01-29T10:57:09Z

Hi @DrKain, thanks first of all for your great tool! I'm using it with Bazarr and it works great!

I recently found out about the --nochains command, and I had one subtitle with a chain to try it. Although it removed part of the chain, it cut off because the "animation" they try to pull off, changed two characters at one; from line to line, so the tool didn't take it as a chain. I think it's a similar example of the one cited at the original request for the removal of chains, where the OP mistyped the example and the removal didn't completely work.

This is the chain I have. Using the "mobile" text as an ad to remove, I got the result of "[Match] Chain found at 9-24 (mobile - +919815899536)". As you can see, line 8 to 9 made a change of more than one character; hence why the chain wasn't completely removed.

In any case, there wouldn't have been a way for me to catch this if it wasn't manually, because there is not a clear ad word to filter. So, my questions are:

Would it be possible to do a removal of ANY chain; as was also suggested in the original request, so any crazy chain like this could be detected; without any match from the filters? Of course, you could add an extra parameter to the call of the command for that
Would it be possible to have kind of a "threshold", to be able to continue the chain even if there is a difference of more than one character between the lines? Somewhere we could specify like a kind of gap that we would be willing to accept, in order to completely remove chains like these ones.

Thanks again and have a great day!

Example Subtitles

1
00:00:02,340 --> 00:00:02,540
©

2
00:00:02,540 --> 00:00:02,740
©

3
00:00:02,740 --> 00:00:02,940
© P

4
00:00:02,940 --> 00:00:03,140
© P@

5
00:00:03,140 --> 00:00:03,340
© P@r

6
00:00:03,340 --> 00:00:03,540
© P@rM

7
00:00:03,540 --> 00:00:03,740
© P@rM!

8
00:00:03,740 --> 00:00:03,940
© P@rM!N

9
00:00:03,940 --> 00:00:04,140
© P@rM! Nd

10
00:00:04,140 --> 00:00:04,340
© P@rM! Nde

11
00:00:04,340 --> 00:00:04,540
© P@rM! NdeR

12
00:00:04,540 --> 00:00:04,740
© P@rM! NdeR

13
00:00:04,740 --> 00:00:04,940
© P@rM! NdeR M

14
00:00:04,940 --> 00:00:05,140
© P@rM! NdeR M@

15
00:00:05,140 --> 00:00:05,340
© P@rM! NdeR M@n

16
00:00:05,340 --> 00:00:05,540
© P@rM! NdeR M@nk

17
00:00:05,540 --> 00:00:05,740
© P@rM! NdeR M@nkÖ

18
00:00:05,740 --> 00:00:05,940
© P@rM! NdeR M@nkÖÖ

19
00:00:05,940 --> 00:00:06,140
© P@rM! NdeR M@nkÖÖ

20
00:00:06,140 --> 00:00:07,340
© P@rM! NdeR M@nkÖÖ ™

21
00:00:07,540 --> 00:00:08,340
© P@rM! NdeR M@nkÖÖ ™

22
00:00:08,540 --> 00:00:09,340
© P@rM! NdeR M@nkÖÖ ™

23
00:00:09,540 --> 00:00:10,340
© P@rM! NdeR M@nkÖÖ ™

24
00:00:10,340 --> 00:00:11,340
© P@rM! NdeR M@nkÖÖ ™
Mobile - +919815899536

The text was updated successfully, but these errors were encountered:

Related to issue #22

DrKain · 2023-01-29T11:17:53Z

Hi, thanks for the feedback and detailed issue.

Would it be possible to do a removal of ANY chain; as was also suggested in the original request, so any crazy chain like this could be detected; without any match from the filters?

This is definitely possible and a good suggestion, I'm currently away from home and will not be able to add this anytime soon, but I'll leave this issue open in case someone wants to take a shot at it while I'm away. If not, I'll work on adding this when I'm available next.
It's worth noting that this would risk incorrect cleans when a line is repeated by one or more people during a scene.

Would it be possible to have kind of a "threshold", to be able to continue the chain even if there is a difference of more than one character between the lines?

Also yes, but in the case of nodes 8-9 there's 3 characters difference, a fuzzy match would also risk breaking valid lines. I try to keep the rules as strict as possible to avoid removing valid subtitles, so I'll need to look more into this.

For now the nodes you have can be cleaned simply with: /^©/
This will look for lines starting with © and remove them, I've added this to the main filters so you can run subclean --update to fetch them.

DrKain · 2023-01-29T11:22:41Z

Linking this to #20 and #4 as they are related

DanAykroyd256 · 2023-01-29T21:17:11Z

Thanks for your quick reply @DrKain and your consideration for improving this! I agree that handling all these edge cases might get crazy :)

I’ll leave you here the .srt of my example, for if you want to use it when you look into this in the future.

Antiviral (2012).en.srt.txt

Have a great week!

DanAykroyd256 added enhancement New feature or request priotiry: low Low priotity labels Jan 29, 2023

DanAykroyd256 assigned DrKain Jan 29, 2023

DanAykroyd256 changed the title ~~Chain Removal with some flexibiliity~~ Chain removal with extra flexibiliity for character "gaps" and no filters Jan 29, 2023

DrKain added a commit that referenced this issue Jan 29, 2023

1 filter

28b82ae

Related to issue #22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chain removal with extra flexibiliity for character "gaps" and no filters #22

Chain removal with extra flexibiliity for character "gaps" and no filters #22

DanAykroyd256 commented Jan 29, 2023 •

edited by DrKain

Loading

DrKain commented Jan 29, 2023 •

edited

Loading

DrKain commented Jan 29, 2023

DanAykroyd256 commented Jan 29, 2023

Chain removal with extra flexibiliity for character "gaps" and no filters #22

Chain removal with extra flexibiliity for character "gaps" and no filters #22

Comments

DanAykroyd256 commented Jan 29, 2023 • edited by DrKain Loading

DrKain commented Jan 29, 2023 • edited Loading

DrKain commented Jan 29, 2023

DanAykroyd256 commented Jan 29, 2023

DanAykroyd256 commented Jan 29, 2023 •

edited by DrKain

Loading

DrKain commented Jan 29, 2023 •

edited

Loading