-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment very long! #95
Comments
If you wish to split a subtitle into shorter ones, there is no built-in function for this, but it can easily be done : from pysubs2 import SSAFile, SSAEvent
from itertools import chain
from textwrap import wrap
input_srt = """\
1
00:00:00,000 --> 00:10:00,000
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas sollicitudin. Sed convallis magna eu sem.
Etiam bibendum elit eget erat. Proin mattis lacinia justo. Etiam posuere lacus quis dolor.
2
00:12:00,000 --> 00:20:00,000
Nulla turpis magna, cursus sit amet, suscipit a, interdum id, felis. Aenean placerat. Nullam rhoncus aliquam metus.
Curabitur vitae diam non enim vestibulum interdum. In laoreet, magna id viverra tincidunt, sem odio bibendum justo, vel imperdiet sapien wisi sed libero.
3
00:32:00,000 --> 00:35:00,000
Short subtitle.
"""
def split_event(e: SSAEvent) -> list[SSAEvent]:
words = e.plaintext.split()
n = len(words)
if n > 10:
e1 = e.copy()
e2 = e.copy()
t = int((e.start + e.end) / 2)
e1.plaintext = "\n".join(wrap(" ".join(words[:n // 2]), 60))
e1.end = t
e2.plaintext = "\n".join(wrap(" ".join(words[n // 2:]),60))
e2.start = t
return [e1, e2]
else:
return [e]
subs = SSAFile.from_string(input_srt)
subs.events = list(chain(*map(split_event, subs.events)))
print(subs.to_string("srt"))
# 1
# 00:00:00,000 --> 00:05:00,000
# Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
# Maecenas sollicitudin. Sed convallis magna eu
#
# 2
# 00:05:00,000 --> 00:10:00,000
# sem. Etiam bibendum elit eget erat. Proin mattis lacinia
# justo. Etiam posuere lacus quis dolor.
#
# 3
# 00:12:00,000 --> 00:16:00,000
# Nulla turpis magna, cursus sit amet, suscipit a, interdum
# id, felis. Aenean placerat. Nullam rhoncus aliquam metus.
# Curabitur vitae diam
#
# 4
# 00:16:00,000 --> 00:20:00,000
# non enim vestibulum interdum. In laoreet, magna id viverra
# tincidunt, sem odio bibendum justo, vel imperdiet sapien
# wisi sed libero.
#
# 5
# 00:32:00,000 --> 00:35:00,000
# Short subtitle. |
yes, that's actually what I did. that's exactly what I did. Thank you. In your script I noticed that you save inside ".plaintext". What is the best approach between I have used instead |
unfortunately, the code you've provided doesn't necessarily guarantee that each section will be 10 words or less (I tried putting it down to 3 words) but iterating the split_event function allows it to be certain I've adjusted the code (in a very hacky way) to keep iterating until it meets the requirement subs = pysubs2.load("subs.srt")
meets_max_length = False
while not meets_max_length:
meets_max_length = True
for chunk in subs:
if len(chunk.plaintext.split()) <= 3:
pass
else:
meets_max_length = False
subs.events = list(chain(*map(split_event, subs.events)))
print(subs.to_string("srt")) |
Yeah, my code was just meant to illustrate how to split a subtitle into multiple ones using the library API. The actual splitting logic can be as sophisticated as needed :)
Splitting Correctly splitting Input to consider:
|
Is it possible to divide the segment into smaller parts?
The text was updated successfully, but these errors were encountered: