Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Using -sc option crashes ccextractor #1115

Closed
1 of 4 tasks
rboy1 opened this issue Oct 17, 2019 · 22 comments
Closed
1 of 4 tasks

[BUG] Using -sc option crashes ccextractor #1115

rboy1 opened this issue Oct 17, 2019 · 22 comments

Comments

@rboy1
Copy link
Contributor

rboy1 commented Oct 17, 2019

Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].

CCExtractor version (using the --version parameter preferably) : 0.88

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

  • [X ] I have read and understood the contributors guide.
  • [X ] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [ X] I have checked that the issue I'm posting isn't already reported.
  • [X ] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [ X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [ X] I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [ ] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [X] Windows - [ ] Linux - [ ] Mac
  • What were the used arguments? -sc

Additional information
Link to file which is causing the crash:
https://www.dropbox.com/s/4jiooj787e02kd3/CCExtractor%20crash.ts?dl=0

Output of cmd line:

C:\ccextractor.0.88-windows.binaries>ccextractorwinfull.exe -sc "CCExtractor crash.ts" -o output.srt
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: CCExtractor crash.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: Yes, but only built-in words] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: CCExtractor crash.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 836 (0x344) for SID 1332 (0x534)
Notice: Teletext page with possible subtitles detected: 801
- No teletext page specified, first received suitable page is 801, not guaranteed
  6%  |  02:01
C:\ccextractor.0.88-windows.binaries>
@sp2703
Copy link

sp2703 commented Nov 3, 2019

I'm wanting to work on this. Any leads where to start off?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Nov 3, 2019 via email

@CCExtractor CCExtractor deleted a comment from vikramcs Nov 7, 2019
@rboy1
Copy link
Contributor Author

rboy1 commented Nov 11, 2019

Patched here:
#1122

rboy1 added a commit to rboy1/ccextractor that referenced this issue Nov 11, 2019
Sentence case crash (-sc)
cfsmp3 pushed a commit that referenced this issue Nov 12, 2019
Sentence case crash (-sc)
@cfsmp3 cfsmp3 closed this as completed Nov 12, 2019
@rboy1
Copy link
Contributor Author

rboy1 commented Jul 17, 2020

@cfsmp3 Carlos any chance of getting a new build? It's been a while and many fixes have gone in since the last release

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jul 20, 2020

I'll bundle a new version after GSoC (in one month or so). These days I really don't have a lot of time on my hands I'm afraid.

@rboy1
Copy link
Contributor Author

rboy1 commented Dec 8, 2020

@cfsmp3 do you think it’s ready for a release?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Dec 8, 2020

Well, there's lots of bugs, but there's no one doing active work these days, so they're not going to go away magically.
I've set a bit of time this Sunday to do a release with what we currently have.

@rboy1
Copy link
Contributor Author

rboy1 commented Dec 15, 2020

@cfsmp3 were you able to get the new build released Carlos?

@rboy1
Copy link
Contributor Author

rboy1 commented Jan 26, 2021

@cfsmp3 bump on release

@rboy1
Copy link
Contributor Author

rboy1 commented May 22, 2021

@cfsmp3 can we expect a new release anytime soon?

@cfsmp3
Copy link
Contributor

cfsmp3 commented May 22, 2021 via email

@rboy1
Copy link
Contributor Author

rboy1 commented Jun 17, 2021

@cfsmp3 Mid June is here, looking forward to it :)

@canihavesomecoffee
Copy link
Member

canihavesomecoffee commented Jun 17, 2021

https://github.com/CCExtractor/ccextractor/releases/tag/v0.89 :)

Windows build is still WIP*. You can download the binaries here though (let me know if you can't): https://github.com/CCExtractor/ccextractor/suites/2983776538/artifacts/67339947

* We're working on a new installer and code signing, the latter is what's holding us back right now

@rboy1
Copy link
Contributor Author

rboy1 commented Jun 18, 2021

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

@canihavesomecoffee
Copy link
Member

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

For Windows the libraries are embedded, so you're indeed stuck to that specific version.

I noticed you had another comment a couple of minutes ago, but it seems to have vanished.

Thanks. When I try to run the GUI it gives me an error, "code execution cannot proceed because pthreadVSE2.dll was not found". Looks like one of the DLL's is missing.

Was that when trying the standalone binary too, or caused by the GUI exe itself?

Anyway, looks like we should add that dll to the generated artifacts too.

@rboy1
Copy link
Contributor Author

rboy1 commented Jun 18, 2021

Hmm, I tried using tessdata 3.04 and it seemed to work fine converting dvbsub to srt

@canihavesomecoffee
Copy link
Member

IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :)

@rboy1
Copy link
Contributor Author

rboy1 commented Jun 18, 2021

On a side note I have some files with multiple dvbsub tracks but when I run ccextractor it only extracts the first track. Is there a way to get it to extract all tracks or maybe specify the track number?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jun 18, 2021

Awesome thanks. I noticed that it says it's compiled against tessdata 4.0alpha. Does it mean it won't work with tessdata 3?

For Windows the libraries are embedded, so you're indeed stuck to that specific version.

Maybe we should also figure out a way to build those libraries again from source :-) @Izaron did that work a few years ago and we haven't touched that ever since I think?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jun 18, 2021

IIRC tessdata is not bound to the tesseract version, so that's indeed no problem :)

 Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). 

Since tesseract 3 is no longer maintained at all, I think we should stick to 4 (which as can be see, supports the pattern recognition mode from v3, so no need to actually use v3).

@rboy1
Copy link
Contributor Author

rboy1 commented Jun 18, 2021

@cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here).

For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jun 18, 2021

@cfsmp3 are you saying that we need to explicitly add --oem 0 to get it to work with Tesseract 3 because 0.89 is working (or am I missing something here).

What I pasted comes from tesseract's website. v4 supports v3's legacy engine, so there's not reason to actually have v3 around at all. If you want to use the old system, just use --oem (If I remember correctly we do expose that argument in CCExtractor).

For future ref, wouldn't be better if ccextractor automatically detects if it's using Tesseract 3 or 4 with an option to override using the --oem?

I don't want to support legacy versions of libraries. If the tesseract maintainers have decided to stop development of v3, what's the reason for us to bother supporting both? Just use v4 and use the legacy mode if it works better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants