Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Merged
merged 6 commits into from
Jun 21, 2018

Commits on Jun 20, 2018

  1. Ensure that line-breaks are *only* skipped after operators in `Lexer.…

    …getNumber` (PR 8359 follow-up)
    
    With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.
    
    Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
    Snuffleupagus committed Jun 20, 2018
    Configuration menu
    Copy the full SHA
    df4799a View commit details
    Browse the repository at this point in the history
  2. Let Lexer.getNumber treat a single decimal point as zero (issue 9252)

    This is consistent with the behaviour in Adobe Reader.
    Snuffleupagus committed Jun 20, 2018
    Configuration menu
    Copy the full SHA
    6bbcafc View commit details
    Browse the repository at this point in the history
  3. Use the correct startPos when repeating the search for 'endobj' ope…

    …rators in `XRef.indexObjects` (PR 9288 follow-up)
    Snuffleupagus committed Jun 20, 2018
    Configuration menu
    Copy the full SHA
    30ad62a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e84813e View commit details
    Browse the repository at this point in the history
  5. Add basic validation of the 'Root' dictionary in XRef.parse and try…

    … to recover when possible
    
    Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
    Snuffleupagus committed Jun 20, 2018
    Configuration menu
    Copy the full SHA
    346810e View commit details
    Browse the repository at this point in the history
  6. Add basic validation of the 'trailer' dictionary candidates in `XRef.…

    …indexObjects` (issue 9418)
    
    This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
    Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
    Snuffleupagus committed Jun 20, 2018
    Configuration menu
    Copy the full SHA
    56e3648 View commit details
    Browse the repository at this point in the history