Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] DVB Teletext subtitle incomplete #922 #926

Merged
merged 8 commits into from
Feb 16, 2018
Merged

[FIX] DVB Teletext subtitle incomplete #922 #926

merged 8 commits into from
Feb 16, 2018

Conversation

BPYap
Copy link
Contributor

@BPYap BPYap commented Feb 8, 2018

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Attempt to solve issue #922 by replacing 0xB and 0xA in the middle of row with space character

Generated srt:
fixed

Playing the generated srt on windows 10 Movies & TV app:
geeignet

BPYap added 3 commits February 8, 2018 19:29
attempt to solve issue #922 by replacing 0xB and 0xA in the middle of row with space character
@mkver
Copy link
Contributor

mkver commented Feb 13, 2018

I looked at your code and it seems that characters between the end of a box and the start of a new box (which are not intended to be displayed) nevertheless end up in the output file. In the case of my samples, all these characters are spaces so that this is fine with them, but there might be others.
How about the following code (disclaimer: I am no programmer and have no means to check whether the following code even compiles etc.):

                int box_open = 0; //indicates whether there is currently a box open.
                // anchors for string trimming purpose
		uint8_t col_start = 40;
		uint8_t col_stop = 39;
                
                for (int8_t col = 0; col <= 39; col++)
                {
                        if ((page->text[row][col] == 0xb) || (page->text[row][col] == 0xa))
                        {
                               box_open = (page->text[row][col]) & 0x1;
                               page->text[row][col] = 0x20;
                        }

                       if (box_open && (col_start == 40) && (page->text[row][col] > 0x20))
		       {
				col_start = col;
				line_count++;
				continue;
		       }
                       
                        if (!box_open && (col_start < 40) && (page->text[row][col] > 0x20))
		                page->text[row][col] = 0x20; //no character outside a box should be displayed
               }

		// line is empty
		if (col_start == 39)
			continue;
                
		while (page->text[row][col_stop] <= 0x20)
                {
                       col_stop--;        
                }

Some remarks:

  • The above is intended to replace the lines 637-667 of the current version (without BPYap's patch).
  • Just like the current version, simple 0xb suffice as a start code. It would be possible to change this, but since no one complained about thte current behaviour, I thought it best not to change it in this regard.
  • All characters in between two boxes end up as spaces.
  • Currently, if a line contains a 0xb, but no character >0x20 to the right of it, the line_count is incremented, but no subtitle is emitted for this row. The if-statement at lines 676-689 might introduce unwanted linebreaks in this situation. The above changes this behaviour: The line is only treated as nonempty if there is a character >0x20 inside a box. I don't know if this actually happens in the wild.
  • Colour information is not processed/touched above.
  • [Edit]: When I wrote this I was unaware of BPYap's third patch.

@BPYap
Copy link
Contributor Author

BPYap commented Feb 13, 2018

@mkver , thanks for pointing out the issue that characters between 0xA and the next 0xB shouldn't be displayed. My third patch failed to address this situation and I think this is the main reason that caused regression test to fail just now. I will try incorporating your idea of setting characters between 0xA and the next 0xB to 0x20.

Provide checks to characters between 0xA and 0xB and set them to 0x20
Provide checks to characters between 0xA and 0xB and set them to 0x20 while  maintaining color information
@BPYap
Copy link
Contributor Author

BPYap commented Feb 13, 2018

@mkver @cfsmp3 I guess the issue is finally resolved, please have a look at my latest commit. 😆
For the failed Windows CI tests, I have compared the results with the other 2 pending PRs. It seems that all PRs failed for these regression IDs: 2, 19, 21, 238, 239, 182, 214, 73, 75, 78 82.

@mkver
Copy link
Contributor

mkver commented Feb 13, 2018

I read your commit and am wondering about something: Imagine you have a situation like the following: 0xB 0xB something 0xA (potentially something) 0xB 0xB something different than 0xA (i.e. the last box is not closed by a 0xa, but by the end of the line). The last_replacement_index will then point to the place of 0xA and at line 678 there will again be a 0xA in there. And the break in the next loop implies that col_stop will contain the position of the last character >0x20 before the 0xA. This in turn means that nothing of the last box will end up in output.
Apart from that I don't really see any need to use this last_replacement_index at all: All characters > 0x20 outside of boxes are replaced with 0x20 anyway so that the first if statement in the second loop is can only be entered if the current position is the one of a non-space character inside a box.

@BPYap
Copy link
Contributor Author

BPYap commented Feb 13, 2018

Thank you again @mkver, for your patience and pointing out potential flaws in the code.

Here's what I changed in the latest commit:

  • removed last_replacement_index related statements
  • removed if (page->text[row][col] == 0xa) break; statement from second loop (since all 0xA characters are replaced with 0x20 in the first loop)
  • defined box_open as uint8_t type, and use YES / NO for its value

@cfsmp3
Copy link
Contributor

cfsmp3 commented Feb 14, 2018

There's some errors according to the CI... @canihavesomecoffee can you check if this is caused by this specific PR?

@thealphadollar
Copy link
Contributor

@cfsmp3 and @canihavesomecoffee Just to add in some information (maybe this could help), the same issues were present in my PR as well and I had changed only the python modules in the api folder. So, I do not think that these warnings and errors are specific to this PR :)

@cfsmp3
Copy link
Contributor

cfsmp3 commented Feb 14, 2018

@mkver Are you able to test this PR with a number of files and see how it behaves?

@mkver
Copy link
Contributor

mkver commented Feb 15, 2018

If someone can make a build for me, yes.

@cfsmp3
Copy link
Contributor

cfsmp3 commented Feb 15, 2018 via email

@mkver
Copy link
Contributor

mkver commented Feb 15, 2018

Yes. Sorry that I forgot to mention it.

@cfsmp3
Copy link
Contributor

cfsmp3 commented Feb 16, 2018

@mkver Here you are:
https://drive.google.com/open?id=1xxBwXeJlms_ogc5LhlOsqZP1ONW6T7xh

Just replace the existing binaries with those.

@mkver
Copy link
Contributor

mkver commented Feb 16, 2018

Is BPYap's patch really included in this build? It says that its version is "Git commit: 550d320" and the issue that should be fixed by #930 is indeed fixed, but the output of the samples I provided for #922 is unchanged.

@cfsmp3
Copy link
Contributor

cfsmp3 commented Feb 16, 2018

@mkver you're right - I did everything except actually merge it. Sorry about that.
I've uploaded correct versions to the same location.

@mkver
Copy link
Contributor

mkver commented Feb 16, 2018

The new files don't have any missing text any more, i.e. the patch is good. I'll close #922. (There is just one line where the builds with the fix for #930 contain one space where libzvbi has two. Strange, but I can live with that. Maybe it contains a control character other than a color code or 0xA and 0xB.)

Thanks to everybody involved for your work on this.

@cfsmp3 cfsmp3 merged commit b7545e0 into CCExtractor:master Feb 16, 2018
@mkver mkver mentioned this pull request Feb 25, 2018
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants