Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Filter for chained/animated nodes #4

Closed
DrKain opened this issue Feb 24, 2021 · 3 comments · Fixed by #20
Closed

[Feature Request] Filter for chained/animated nodes #4

DrKain opened this issue Feb 24, 2021 · 3 comments · Fixed by #20
Labels
enhancement New feature or request priotiry: low Low priotity

Comments

@DrKain
Copy link
Owner

DrKain commented Feb 24, 2021

This one is a bit trickier to handle and explain in text. Some subtitle uploaders have decided to add incredibly intrusive animated credits. They follow a similar format:

310
01:23:53,995 --> 01:23:54,470
S

311
01:23:54,470 --> 01:23:54,945
Su

312
01:23:54,945 --> 01:23:55,420
Sub

313
01:23:55,420 --> 01:23:55,895
Subt

314
01:23:55,895 --> 01:23:56,370
Subti

315
01:23:56,370 --> 01:23:56,845
Subtit

316
01:23:56,845 --> 01:23:57,320
Subtitl

317
01:23:57,320 --> 01:23:57,795
Subtitle

318
01:23:57,795 --> 01:23:58,270
Subtitles
U

319
01:23:58,270 --> 01:23:58,745
Subtitles By
Us

320
01:23:58,745 --> 01:23:59,220
Subtitles By
Use

321
01:23:59,220 --> 01:23:59,695
Subtitles By
User

322
01:23:59,695 --> 01:24:00,170
Subtitles By
Usern

323
01:24:00,170 --> 01:24:00,645
Subtitles By
Userna

324
01:24:00,645 --> 01:24:01,120
Subtitles By
Username

Right now subclean can handle nodes 319 to 324 but the preceding nodes remain. A special handler will need to be written that scans for these chained nodes. I'll probably end up doing it one of two ways.

Option A:

  1. Advertising detected at node 319
  2. Check node 318 for partial match
  3. Continue checking (and removing) previous nodes until it's unable to match

Option B:

  1. Scan the entire file for these chained nodes
  2. Remove the entire chain regardless of the content
@DrKain DrKain added the enhancement New feature or request label Feb 24, 2021
@DrKain
Copy link
Owner Author

DrKain commented Feb 24, 2021

Here is an example of these chained nodes in a subtitle file.
This is a manually cleaned version of the file showing the ideal outcome.

@DrKain DrKain added the priotiry: low Low priotity label Feb 24, 2021
@Eytan414
Copy link
Contributor

I decided to implement option A because B seemed bit too brute and might cause undesired effects, I've made a PR but there's a caveat - it doesn't work on provided subtitle file example but I think it's due to node #318 which is probably a rare case or a typo.

I haven't really worked on subtitle files in the past so I guess you'll have a better assessment than I, here are the relevant details:

01:23:57,320 --> 01:23:57,795
Subtitle

318
01:23:57,795 --> 01:23:58,270
Subtitles
U

319
01:23:58,270 --> 01:23:58,745
Subtitles By
Us 

My code works either if node 318's text had " by" after "Subtitles" in top row or 2nd row("U") didn't exist which I believe is a subtitle uploader which made a mistake in their ad.

@DrKain
Copy link
Owner Author

DrKain commented Aug 27, 2022

Node 318 was probably a mistake on my end, I had no examples on hand so I wrote that one out.

Linking to #20

@DrKain DrKain linked a pull request Aug 27, 2022 that will close this issue
DrKain added a commit that referenced this issue Aug 27, 2022
remove chained ads in case of regex-filter hit
@DrKain DrKain closed this as completed Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priotiry: low Low priotity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants