Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fix possible segfaults in hardsubx_classifier.c due to strdup #963

Merged
merged 1 commit into from
Mar 16, 2018

Conversation

saurabhshah0410
Copy link
Contributor

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

strdup will give a segmentation fault if the argument passed to it is
NULL. TessResultIteratorGetUTF8Text returns a char* which can be NULL
and we should not call strdup directly over it. Once we check if the
value returned is not NULL, then we can call strdup.

This also fixes #928.

if(word==NULL || strlen(word)==0)
continue;
word = strdup(word);
Copy link
Contributor

@cfsmp3 cfsmp3 Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct to me (but the previous one isn't either).

word = strdup(word);

We're losing the original pointer - assuming TessResultIteratorGetUTF8Text() allocates something (I assume it does, but I don't know for sure) we now have a leak.

Using strdup() seems OK to me IF we're going to be messing with the data later, but in this case we should immediately free the tesseract resource after making a copy, since we no longer need it.

Check this out:
http://tess4j.sourceforge.net/docs/docs-1.2/net/sourceforge/tess4j/TessAPI.html#TessBaseAPIGetIterator(net.sourceforge.tess4j.TessAPI.TessBaseAPI)

TessBaseAPIGetIterator

TessAPI.TessResultIterator TessBaseAPIGetIterator(TessAPI.TessBaseAPI handle)

Get a reading-order iterator to the results of LayoutAnalysis and/or Recognize. ****. WARNING! This class points to data held within the TessBaseAPI class, and therefore can only be used while the TessBaseAPI class still exists and has not been subjected to a call of Init, SetImage, Recognize, Clear, End, DetectOS, or anything else that changes the internal PAGE_RES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I didn't realize that I was creating a memory leak. So something like:

char *word_ = TessResultIteratorGetUTF8Text(it, level);
if(word_ == NULL || !strlen(word_))
    continue;
char *word = strdup(word_);
free(word_);

This should work fine right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... you can't use free() to unallocate resources returned by TessResultIteratorGetUTF8Text() - take a look at the link I sent. Most likely (but check - I just browsed the API) it's TessResultIteratorDelete.

Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on the affected line.

strdup will give a segmentation fault if the argument passed to it is
NULL. TessResultIteratorGetUTF8Text returns a char* which can be NULL
and we should not call strdup directly over it. Once we check if the
value returned is not NULL, then we can call strdup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Error while using "-hardsubx"
2 participants