"URI" in PDF attributes may be a string itself #31

theiostream · 2018-10-12T19:49:40Z

The URI value in an attribute object may be itself a string, instead of a PDFObjRef. Not dealing with this case would cause many URIs to be ignored. The following patch fixed the issue for me, but a better solution may be desirable:

@@ -282,16 +279,22 @@ class PDFMinerBackend(ReaderBackend):
         if isinstance(obj_resolved, list):
             return [self.resolve_PDFObjRef(o) for o in obj_resolved]

+        print(obj_resolved)
         if "URI" in obj_resolved:
             if isinstance(obj_resolved["URI"], PDFObjRef):
                 return self.resolve_PDFObjRef(obj_resolved["URI"])
+            elif isinstance(obj_resolved["URI"], (str, unicode)):
+               if IS_PY2:
+                   ref = obj_resolved["URI"].decode("utf-8")
+               else:
+                   ref = obj_resolved
+               return Reference(ref, self.curpage)

The text was updated successfully, but these errors were encountered:

morriscode · 2018-11-28T21:34:03Z

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"URI" in PDF attributes may be a string itself #31

"URI" in PDF attributes may be a string itself #31

theiostream commented Oct 12, 2018

morriscode commented Nov 28, 2018

"URI" in PDF attributes may be a string itself #31

"URI" in PDF attributes may be a string itself #31

Comments

theiostream commented Oct 12, 2018

morriscode commented Nov 28, 2018