Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

"URI" in PDF attributes may be a string itself #31

Open
theiostream opened this issue Oct 12, 2018 · 1 comment
Open

"URI" in PDF attributes may be a string itself #31

theiostream opened this issue Oct 12, 2018 · 1 comment

Comments

@theiostream
Copy link

The URI value in an attribute object may be itself a string, instead of a PDFObjRef. Not dealing with this case would cause many URIs to be ignored. The following patch fixed the issue for me, but a better solution may be desirable:

@@ -282,16 +279,22 @@ class PDFMinerBackend(ReaderBackend):
         if isinstance(obj_resolved, list):
             return [self.resolve_PDFObjRef(o) for o in obj_resolved]

+        print(obj_resolved)
         if "URI" in obj_resolved:
             if isinstance(obj_resolved["URI"], PDFObjRef):
                 return self.resolve_PDFObjRef(obj_resolved["URI"])
+            elif isinstance(obj_resolved["URI"], (str, unicode)):
+               if IS_PY2:
+                   ref = obj_resolved["URI"].decode("utf-8")
+               else:
+                   ref = obj_resolved
+               return Reference(ref, self.curpage)
@morriscode
Copy link

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants