Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for broken outlines/annotations, where the destination dictionary contains an invalid /Dest entry #8825

Merged
merged 1 commit into from
Aug 26, 2017
Merged

Conversation

Snuffleupagus
Copy link
Collaborator

According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a Dest entry in an outline item should not contain a dictionary.
Unsurprisingly there's PDF generators that completely ignore this, treating is an A entry instead.

The patch also adds a little bit more validation code in Catalog.parseDestDictionary.

Fixes #5089 (comment).

@Snuffleupagus
Copy link
Collaborator Author

/botio unittest

@mozilla mozilla deleted a comment from pdfjsbot Aug 25, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 25, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 25, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 25, 2017
@Snuffleupagus
Copy link
Collaborator Author

/botio unittest

@Rob--W
Copy link
Member

Rob--W commented Aug 26, 2017

/botio preview

src/core/obj.js Outdated
dest = destDict.get('Dest');
} else if (destDict.has('Dest')) { // Simple destination.
let simpleDest = destDict.get('Dest');
if (isDict(simpleDest)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about directly assigning to dest instead of a temporary variable? In case simpleDest is invalid, you are already returning early, so not using a temporary variable should be fine.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the patch to do the proper checks above, so that we'll no longer reach this branch in the isDict(destDict.get('Dest')) === true case. Hence these changes were simply reverted.

@@ -689,6 +689,34 @@ describe('annotation', function() {
expect(data.dest).toEqual([{ num: 17, gen: 0, }, { name: 'XYZ', },
0, 841.89, null]);
});

it('should correctly parse a Dest, which violates the specification ' +
'by containing a dictionary', function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to the line where the invalid dictionary is constructed, for those who don't know the PDF spec by heart?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't completely follow what sort of comment you're asking for here. Are you simply asking for a reference to the particular section of the PDF spec, or something else?

[...] where the invalid dictionary is constructed,

I might be reading too much into this, but just to clarify:
Please note that there's no way for a Dest entry to contain a "valid" dictionary, the mere fact that one is even present makes it invalid according to the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a comment before line 703, something like this (edit as needed):

// Dest must be a Name or Array (PDF27000 section x.y.z), but there are PDFs that store a dictionary.

@Rob--W
Copy link
Member

Rob--W commented Aug 26, 2017

Looks good to me, can be merged after feedback on the above nits.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Aug 26, 2017

@Rob--W Thanks for the review!
For easier reviewing, I'm including an interdiff of the changes below:

diff --git a/src/core/obj.js b/src/core/obj.js
index 2e1228de..95924a6f 100644
--- a/src/core/obj.js
+++ b/src/core/obj.js
@@ -651,7 +651,7 @@ var Catalog = (function CatalogClosure() {
     var docBaseUrl = params.docBaseUrl || null;
 
     var action = destDict.get('A'), url, dest;
-    if (!action && destDict.has('Dest')) {
+    if (!isDict(action) && destDict.has('Dest')) {
       // A /Dest entry should *only* contain a Name or an Array, but some bad
       // PDF generators ignore that and treat it as an /A entry.
       action = destDict.get('Dest');
@@ -764,12 +764,7 @@ var Catalog = (function CatalogClosure() {
           break;
       }
     } else if (destDict.has('Dest')) { // Simple destination.
-      let simpleDest = destDict.get('Dest');
-      if (isDict(simpleDest)) {
-        warn('parseDestDictionary: The Dest entry should not be a dictionary.');
-        return;
-      }
-      dest = simpleDest;
+      dest = destDict.get('Dest');
     }
 
     if (isString(url)) {
diff --git a/test/unit/annotation_spec.js b/test/unit/annotation_spec.js
index 024355a8..faf3293d 100644
--- a/test/unit/annotation_spec.js
+++ b/test/unit/annotation_spec.js
@@ -700,6 +700,8 @@ describe('annotation', function() {
       let annotationDict = new Dict();
       annotationDict.set('Type', Name.get('Annot'));
       annotationDict.set('Subtype', Name.get('Link'));
+      // The /Dest must be a Name or an Array, refer to ISO 32000-1:2008
+      // section 12.3.3, but there are PDF files where it's a dictionary.
       annotationDict.set('Dest', destDict);
 
       let annotationRef = new Ref(798, 0);

…nary contains an invalid `/Dest` entry

According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a `Dest` entry in an outline item should *not* contain a dictionary.
Unsurprisingly there's PDF generators that completely ignore this, treating is an `A` entry instead.

The patch also adds a little bit more validation code in `Catalog.parseDestDictionary`.
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@Snuffleupagus
Copy link
Collaborator Author

/botio unittest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7da0bb68d4083eb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/0b76ee4abc14969/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7da0bb68d4083eb/output.txt

Total script time: 2.60 mins

  • Unit Tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/17a533e255dfbe2/output.txt

@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 26, 2017
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/17a533e255dfbe2/output.txt

Total script time: 2.36 mins

Published

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/0b76ee4abc14969/output.txt

Total script time: 12.62 mins

  • Unit Tests: Passed

@Rob--W Rob--W merged commit a0eed97 into mozilla:master Aug 26, 2017
@Rob--W
Copy link
Member

Rob--W commented Aug 26, 2017

Thanks for the patch and review response!

@Snuffleupagus Snuffleupagus deleted the simpleDest-dictionary branch August 26, 2017 17:21
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Account for broken outlines/annotations, where the destination dictionary contains an invalid `/Dest` entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants