Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notice: iconv(): Detected an illegal character in input string #549

Closed
LucianoHanna opened this issue Aug 4, 2022 · 6 comments · Fixed by #580
Closed

Notice: iconv(): Detected an illegal character in input string #549

LucianoHanna opened this issue Aug 4, 2022 · 6 comments · Fixed by #580

Comments

@LucianoHanna
Copy link
Contributor

  • PHP Version: 7.4.18
  • PDFParser Version: 2.2.1

Description:

I am getting this warning on $pdf->getPages()[0]->getDataTm():
Notice: iconv(): Detected an illegal character in input string

Stack Trace:

ErrorException: Notice: iconv(): Detected an illegal character in input string
#8 vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php(604): Smalot\PdfParser\Font::decodeContentByEncodingElement
#7 vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php(553): Smalot\PdfParser\Font::decodeContentByEncoding
#6 vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php(467): Smalot\PdfParser\Font::decodeContent
#5 vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php(467): Smalot\PdfParser\Page::extractDecodedRawData
#4 vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php(502): Smalot\PdfParser\Page::getDataCommands
#3 vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php(655): Smalot\PdfParser\Page::getDataTm

@k00ni
Copy link
Collaborator

k00ni commented Aug 5, 2022

Can you provide a PDF or example input which raises this error?

@LucianoHanna
Copy link
Contributor Author

I will try to reproduce this bug with a PDF without sensitive information.

@samvidhik
Copy link

Facing the same issue with Laravel 10.0 and smalot/pdfparser 2.3.0

Getting following exception:
iconv(): Detected an illegal character in input string

at C:\xampp\htdocs\desmis\vendor\smalot\pdfparser\src\Smalot\PdfParser\Font.php:606
602▕ // mb_convert_encoding does not support MacRoman/macintosh,
603▕ // so we use iconv() here
604▕ $iconvEncodingName = $this->getIconvEncodingNameOrNullByPdfEncodingName($pdfEncodingName);
605▕
➜ 606▕ return $iconvEncodingName ? iconv($iconvEncodingName, 'UTF-8', $text) : null;
607▕ }
608▕
609▕ /**
610▕ * Convert PDF encoding name to iconv-known encoding name.

Attaching pdf file being parsed.
tsclist.pdf

@k00ni
Copy link
Collaborator

k00ni commented Mar 30, 2023

@samvidhik Please test if #580 fixes your problem. A note in the pull request would be appreciated.

@eapacheco
Copy link

Issue

Hi @k00ni, I'm having the same issue here.
And I managed to find a public PDF that reproduces the issue: ufsc.pdf

  • Php 8.2.4
  • laravel/framework 9.51.0
  • smalot/PdfParser 2.4.0

⚠️ Stack trace below is from a test I created to reproduce the issue, thus it starts with PHPUnit

  • PHPUnit 9.6.3
ErrorException {#3804 // tests/sandbox.php:24
  #message: "iconv(): Detected an illegal character in input string"
  #code: 0
  #file: "./vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php"
  #line: 606
  #severity: E_NOTICE
  trace: {
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:606 { …}
    ./vendor/laravel/framework/src/Illuminate/Foundation/Bootstrap/HandleExceptions.php:266 { …}
    Illuminate\Foundation\Bootstrap\HandleExceptions->Illuminate\Foundation\Bootstrap\{closure}() {}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:606 { …}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:555 { …}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:469 { …}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php:450 { …}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php:506 { …}
    ./vendor/smalot/pdfparser/src/Smalot/PdfParser/Page.php:657 { …}
    ./tests/sandbox.php:22 {
      Tests\sandbox->test(): void^
      › try {
      ›     $pdf->getPages()[0]->getDataTm();
      › } catch (Exception $e) {
    }
    ./vendor/phpunit/phpunit/src/Framework/TestCase.php:1608 { …}
    ./vendor/phpunit/phpunit/src/Framework/TestCase.php:1214 { …}
    ./vendor/phpunit/phpunit/src/Framework/TestResult.php:728 { …}
    ./vendor/phpunit/phpunit/src/Framework/TestCase.php:964 { …}
    ./vendor/phpunit/phpunit/src/Framework/TestSuite.php:684 { …}
    ./vendor/phpunit/phpunit/src/TextUI/TestRunner.php:653 { …}
    ./vendor/phpunit/phpunit/src/TextUI/Command.php:144 { …}
    ./vendor/phpunit/phpunit/src/TextUI/Command.php:97 { …}
    ./vendor/phpunit/phpunit/phpunit:98 { …}
  }
}

Fixes

The solution proposed at #580 does solve my issue. Before that, I had tried working with mb_convert_encoding, given that I do not need MACINTOSH support:

if ($iconvEncodingName === 'MACINTOSH') {
    return $iconvEncodingName ? iconv($iconvEncodingName, 'UTF-8', $text) : null;
}

return $iconvEncodingName ? mb_convert_encoding($text, 'UTF-8', $iconvEncodingName) : null;

@fbett
Copy link

fbett commented Apr 19, 2023

@samvidhik Please test if #580 fixes your problem. A note in the pull request would be appreciated.

I also had this error. #580 solved it completely.

k00ni added a commit that referenced this issue Apr 24, 2023
* fix to iconv() illegal character error (issue #549)

* display warnings in PHPUnit; added incomplete test to demonstrate fix

* revert --display-notices in Makefile

because it fails in versions < 10

* Fixed coding style problem

* FontTest.php: finalized test which triggers notice when don't using the fix

---------

Co-authored-by: Konrad Abicht <hi@inspirito.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants