Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4306 - Extract tags from SubRip subtitles, add support for alignment #4582

Merged
merged 5 commits into from
Oct 3, 2018

Conversation

szaboa
Copy link
Contributor

@szaboa szaboa commented Jul 26, 2018

Extracted valid tags from srt subtitles (so those won't be rendered anymore) and added support for different alignments tags (based on SSA v4+ specification) such as {\an2} for example.

@@ -154,6 +155,27 @@ public void testDecodeNoEndTimecodes() throws IOException {
.isEqualTo("Or to the end of the media.");
}

@Test
public void testDecodeCueWithTag() throws IOException{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't appear to be asserting anything other than that the tags were removed (or not) from the text. Shouldn't the test also be asserting that they were applied successfully (e.g. that the subtitles in the cues have the right alignment parameters)

assertThat(subtitle.getCues(subtitle.getEventTime(4)).get(0).text.toString())
.isEqualTo("This is the third subtitle.");

// Based on the SSA v4+ specs the curly bracket must be followed by a backslash, so this is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to this doc (just in this thread, not in the source). The doc I found has a bunch of examples where this is always true, but it's also always true that there's no whitespace inside the brackets either. Hence it seems strange to forbid { \\an2} but not {\ an2}. Unless you have counter examples, I think we should just require no whitespace in the bracketed region. Which I think you can do by using [^\s] instead of . in the Pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the docs that can be downloaded from here:
moodub.free.fr/video/ass-specs.doc

And it says: "All override codes are always preceded by a backslash "
So that's why I didn't allow a space after the bracket. Regarding spaces after the backslash I just followed how the VLC handles these cases.

We sure can require no whitespace in the bracketed region, is up to you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we can allow space, if there's an existing example of doing this (as there seems to be)!

Spanned text = Html.fromHtml(textBuilder.toString());
cues.add(new Cue(text));
// Extract tags
SubtitleTagResult tagResult = extractTags(textBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you wouldn't expect a tag to be separated over multiple lines, it seems cleaner to do this in the while loop above. You could replace the last line in the while loop with something like:

textBuilder.append(processLine(currentLine, tags));

Where tags is an Arraylist to which any new tags are added, and processLine is responsible for trimming the line and extracting tags. Note passing the list in as an argument also avoids having to define SubtitleTagResult.

You'd then make the tags list before the start of the while loop to pass in (you could even re-use the same single instance if you want, as is done for the StringBuilder, as long as you remember to clear it.

Cue cue = null;

// Check if tags are present
if (tagResult.tags.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - You can probably remove this. The loop will be a no-op if tags has length zero.

// Check if tags are present
if (tagResult.tags.length > 0) {

boolean alignTagFound = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just break; from the loop when you find the first align tag, and avoid the need for this?

Copy link
Contributor Author

@szaboa szaboa Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I could, but I intended to wrote like this to be generic and to be able to extend the loop with other tag's "apply" logic. Note that this for loop isn't only for align tags (for now it is), instead goes over all the tags and applies them, I suppose it will have support for more tags in the future and it that case breaking after the first align tag would be incorrect behavior.

Taking these into consideration, if you still would like to just "break", then I can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our preference (for small blocks of code; this doesn't really apply to larger scale design choices for obvious reasons) is to write it in the simplest way for current usage, and change it as and when the usage changes. So I think using break is still preferable for now.

if (alignTagFound) continue;
alignTagFound = true;

AlignmentResult alignmentResult = getAlignmentValues(tag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change getAlignmentValues(tag) to be buildCue(Spanned text, String alignmentTag) then you can get rid of AlignmentResult.

@szaboa
Copy link
Contributor Author

szaboa commented Aug 30, 2018

@ojw28 waiting for your opinion on the break; thing, meanwhile I will extend the test cases. The rest is resolved.

@ojw28
Copy link
Contributor

ojw28 commented Sep 3, 2018

Replied to the break; thing. Please push the updated revision when ready. Thanks!

@ojw28
Copy link
Contributor

ojw28 commented Sep 23, 2018

@szaboa - I think there's still an open comment on testDecodeCueWithTag. Perhaps you're aware of this already; just checking!

@szaboa
Copy link
Contributor Author

szaboa commented Sep 24, 2018

Yes, there is that and the break thing. I am on vacation right now, I will resolve those on the next Monday (first of October) when I am back, sorry for the delays!

@szaboa
Copy link
Contributor Author

szaboa commented Oct 1, 2018

@ojw28
Just pushed the changes for the remaining open comments. Everything should be OK now.

Copy link
Contributor

@ojw28 ojw28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. I put a few final comments inline.

break;
}

return new Cue(text, Layout.Alignment.ALIGN_NORMAL, line, Cue.LINE_TYPE_FRACTION, lineAnchor, position, positionAnchor, Cue.DIMEN_UNSET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use null for text alignment? Cue implies that's what you should use unless you know better.

line = DEFAULT_END_FRACTION;
position = DEFAULT_START_FRACTION;
positionAnchor = Cue.ANCHOR_TYPE_START;
lineAnchor = Cue.ANCHOR_TYPE_END;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to order the assignments as line,lineAnchor,position,positionAnchor so the line and position ones are grouped. Ignorable, but it might be even clearer if split into two switch blocks with three cases each, like:

float position;
@Cue.AnchorType int positionAnchor;
switch (alignmentTag) {
  case ALIGN_BOTTOM_LEFT:
  case ALIGN_MID_LEFT:
  case ALIGN_TOP_LEFT:
    position = DEFAULT_START_FRACTION;
    positionAnchor = Cue.ANCHOR_TYPE_START;
  case ALIGN_BOTTOM_MID:
  case ALIGN_MID_MID:
  case ALIGN_TOP_MID:
    ...
  case ALIGN_BOTTOM_RIGHT:
  case ALIGN_MID_RIGHT:
  case ALIGN_TOP_RIGHT:
    ...
}
... (ditto for line and lineAnchor) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, much nicer!

*/
private Cue buildCue(Spanned text, String alignmentTag) {
// Default values used for positioning the subtitle in case of align tags
float line = DEFAULT_END_FRACTION, position = DEFAULT_MID_FRACTION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to assign initial values to any of these (it's better not to, since that'll make the compiler ensure they're initialized along all code paths below). Add default: to the last case in the switch statement if needed to make the compiler happy.

…signments, setting Cue's textAlignment to def value - null
@szaboa
Copy link
Contributor Author

szaboa commented Oct 2, 2018

Pushed the changes according to the comments.

@ojw28 ojw28 merged commit 16fe67b into google:dev-v2 Oct 3, 2018
@ojw28
Copy link
Contributor

ojw28 commented Oct 3, 2018

Thanks! We made some further improvements during internal review, which were merged in 7849a5e. Let us know if we accidentally broken anything :).

@szaboa
Copy link
Contributor Author

szaboa commented Oct 4, 2018

It works fine and looks much better now, actually I am feeling that I did poor job after seeing your changes :)

Anyway, based on this I will try to pick up a better coding style in my next pull request.
Cheers!

@ojw28
Copy link
Contributor

ojw28 commented Oct 4, 2018

Not at all; this feature wouldn't have happened without you ;). Thanks for the contribution!

ojw28 added a commit that referenced this pull request Oct 20, 2018
#4306 - Extract tags from SubRip subtitles, add support for alignment
@google google locked and limited conversation to collaborators May 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants