[FIX] DVB Teletext subtitle incomplete #922 #926

BPYap · 2018-02-08T11:45:10Z

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

Attempt to solve issue #922 by replacing 0xB and 0xA in the middle of row with space character

Generated srt:

Playing the generated srt on windows 10 Movies & TV app:

attempt to solve issue #922 by replacing 0xB and 0xA in the middle of row with space character

mkver · 2018-02-13T10:18:46Z

I looked at your code and it seems that characters between the end of a box and the start of a new box (which are not intended to be displayed) nevertheless end up in the output file. In the case of my samples, all these characters are spaces so that this is fine with them, but there might be others.
How about the following code (disclaimer: I am no programmer and have no means to check whether the following code even compiles etc.):

                int box_open = 0; //indicates whether there is currently a box open.
                // anchors for string trimming purpose
		uint8_t col_start = 40;
		uint8_t col_stop = 39;
                
                for (int8_t col = 0; col <= 39; col++)
                {
                        if ((page->text[row][col] == 0xb) || (page->text[row][col] == 0xa))
                        {
                               box_open = (page->text[row][col]) & 0x1;
                               page->text[row][col] = 0x20;
                        }

                       if (box_open && (col_start == 40) && (page->text[row][col] > 0x20))
		       {
				col_start = col;
				line_count++;
				continue;
		       }
                       
                        if (!box_open && (col_start < 40) && (page->text[row][col] > 0x20))
		                page->text[row][col] = 0x20; //no character outside a box should be displayed
               }

		// line is empty
		if (col_start == 39)
			continue;
                
		while (page->text[row][col_stop] <= 0x20)
                {
                       col_stop--;        
                }

Some remarks:

The above is intended to replace the lines 637-667 of the current version (without BPYap's patch).
Just like the current version, simple 0xb suffice as a start code. It would be possible to change this, but since no one complained about thte current behaviour, I thought it best not to change it in this regard.
All characters in between two boxes end up as spaces.
Currently, if a line contains a 0xb, but no character >0x20 to the right of it, the line_count is incremented, but no subtitle is emitted for this row. The if-statement at lines 676-689 might introduce unwanted linebreaks in this situation. The above changes this behaviour: The line is only treated as nonempty if there is a character >0x20 inside a box. I don't know if this actually happens in the wild.
Colour information is not processed/touched above.
[Edit]: When I wrote this I was unaware of BPYap's third patch.

BPYap · 2018-02-13T12:09:31Z

@mkver , thanks for pointing out the issue that characters between 0xA and the next 0xB shouldn't be displayed. My third patch failed to address this situation and I think this is the main reason that caused regression test to fail just now. I will try incorporating your idea of setting characters between 0xA and the next 0xB to 0x20.

Provide checks to characters between 0xA and 0xB and set them to 0x20

Provide checks to characters between 0xA and 0xB and set them to 0x20 while maintaining color information

BPYap · 2018-02-13T15:21:25Z

@mkver @cfsmp3 I guess the issue is finally resolved, please have a look at my latest commit. 😆
For the failed Windows CI tests, I have compared the results with the other 2 pending PRs. It seems that all PRs failed for these regression IDs: 2, 19, 21, 238, 239, 182, 214, 73, 75, 78 82.

mkver · 2018-02-13T16:52:34Z

I read your commit and am wondering about something: Imagine you have a situation like the following: 0xB 0xB something 0xA (potentially something) 0xB 0xB something different than 0xA (i.e. the last box is not closed by a 0xa, but by the end of the line). The last_replacement_index will then point to the place of 0xA and at line 678 there will again be a 0xA in there. And the break in the next loop implies that col_stop will contain the position of the last character >0x20 before the 0xA. This in turn means that nothing of the last box will end up in output.
Apart from that I don't really see any need to use this last_replacement_index at all: All characters > 0x20 outside of boxes are replaced with 0x20 anyway so that the first if statement in the second loop is can only be entered if the current position is the one of a non-space character inside a box.

BPYap · 2018-02-13T17:23:30Z

Thank you again @mkver, for your patience and pointing out potential flaws in the code.

Here's what I changed in the latest commit:

removed last_replacement_index related statements
removed if (page->text[row][col] == 0xa) break; statement from second loop (since all 0xA characters are replaced with 0x20 in the first loop)
defined box_open as uint8_t type, and use YES / NO for its value

cfsmp3 · 2018-02-14T16:57:57Z

There's some errors according to the CI... @canihavesomecoffee can you check if this is caused by this specific PR?

thealphadollar · 2018-02-14T17:18:28Z

@cfsmp3 and @canihavesomecoffee Just to add in some information (maybe this could help), the same issues were present in my PR as well and I had changed only the python modules in the api folder. So, I do not think that these warnings and errors are specific to this PR :)

cfsmp3 · 2018-02-14T21:14:51Z

@mkver Are you able to test this PR with a number of files and see how it behaves?

mkver · 2018-02-15T05:06:37Z

If someone can make a build for me, yes.

cfsmp3 · 2018-02-15T05:21:17Z

For windows I assume?

…

On Wed, Feb 14, 2018 at 9:06 PM, mkver ***@***.***> wrote: If someone can make a build for me, yes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#926 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFrJ2UNIpXNQZbKTv_99mZbkOi155pszks5tU7tdgaJpZM4R-N_p> .

mkver · 2018-02-15T05:27:39Z

Yes. Sorry that I forgot to mention it.

cfsmp3 · 2018-02-16T01:23:16Z

@mkver Here you are:
https://drive.google.com/open?id=1xxBwXeJlms_ogc5LhlOsqZP1ONW6T7xh

Just replace the existing binaries with those.

mkver · 2018-02-16T11:32:46Z

Is BPYap's patch really included in this build? It says that its version is "Git commit: 550d320" and the issue that should be fixed by #930 is indeed fixed, but the output of the samples I provided for #922 is unchanged.

cfsmp3 · 2018-02-16T18:50:31Z

@mkver you're right - I did everything except actually merge it. Sorry about that.
I've uploaded correct versions to the same location.

mkver · 2018-02-16T21:13:14Z

The new files don't have any missing text any more, i.e. the patch is good. I'll close #922. (There is just one line where the builds with the fix for #930 contain one space where libzvbi has two. Strange, but I can live with that. Maybe it contains a control character other than a color code or 0xA and 0xB.)

Thanks to everybody involved for your work on this.

BPYap added 3 commits February 8, 2018 19:29

solving [BUG] DVB Teletext subtitle incomplete #922

bcffe2a

attempt to solve issue #922 by replacing 0xB and 0xA in the middle of row with space character

replace all 0xA characters within startbox with 0x20

6295496

Fixing Bug #922

8ff8443

Fix Bug #922

9b0c12a

BPYap added 2 commits February 13, 2018 20:31

Fix Bug #922

2258ab2

Provide checks to characters between 0xA and 0xB and set them to 0x20

Fix Bug #922

9385929

Provide checks to characters between 0xA and 0xB and set them to 0x20 while maintaining color information

Fixing Bug #922

5d3e2cd

Fix code indentation

b7545e0

cfsmp3 merged commit b7545e0 into CCExtractor:master Feb 16, 2018

mkver mentioned this pull request Feb 25, 2018

[BUG] Code page problems #937

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] DVB Teletext subtitle incomplete #922 #926

[FIX] DVB Teletext subtitle incomplete #922 #926

BPYap commented Feb 8, 2018 •

edited

Loading

mkver commented Feb 13, 2018 •

edited

Loading

BPYap commented Feb 13, 2018 •

edited

Loading

BPYap commented Feb 13, 2018

mkver commented Feb 13, 2018

BPYap commented Feb 13, 2018 •

edited

Loading

cfsmp3 commented Feb 14, 2018

thealphadollar commented Feb 14, 2018

cfsmp3 commented Feb 14, 2018

mkver commented Feb 15, 2018

cfsmp3 commented Feb 15, 2018 via email

mkver commented Feb 15, 2018

cfsmp3 commented Feb 16, 2018

mkver commented Feb 16, 2018 •

edited

Loading

cfsmp3 commented Feb 16, 2018

mkver commented Feb 16, 2018 •

edited

Loading

[FIX] DVB Teletext subtitle incomplete #922 #926

[FIX] DVB Teletext subtitle incomplete #922 #926

Conversation

BPYap commented Feb 8, 2018 • edited Loading

mkver commented Feb 13, 2018 • edited Loading

BPYap commented Feb 13, 2018 • edited Loading

BPYap commented Feb 13, 2018

mkver commented Feb 13, 2018

BPYap commented Feb 13, 2018 • edited Loading

cfsmp3 commented Feb 14, 2018

thealphadollar commented Feb 14, 2018

cfsmp3 commented Feb 14, 2018

mkver commented Feb 15, 2018

cfsmp3 commented Feb 15, 2018 via email

mkver commented Feb 15, 2018

cfsmp3 commented Feb 16, 2018

mkver commented Feb 16, 2018 • edited Loading

cfsmp3 commented Feb 16, 2018

mkver commented Feb 16, 2018 • edited Loading

BPYap commented Feb 8, 2018 •

edited

Loading

mkver commented Feb 13, 2018 •

edited

Loading

BPYap commented Feb 13, 2018 •

edited

Loading

BPYap commented Feb 13, 2018 •

edited

Loading

mkver commented Feb 16, 2018 •

edited

Loading

mkver commented Feb 16, 2018 •

edited

Loading