Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in pdf2txt.py, due to recently added code #65

Closed
hughsw opened this issue May 25, 2017 · 1 comment
Closed

Crash in pdf2txt.py, due to recently added code #65

hughsw opened this issue May 25, 2017 · 1 comment

Comments

@hughsw
Copy link
Contributor

hughsw commented May 25, 2017

Running pdf2txt.py on the attached PDF crashes with an attribute error in recently added code, commit 82af7f0 (see #56).

175.pdf

bash-3.2$ python3 /usr/local/bin/pdf2txt.py 175.pdf
INFO:pdfminer.pdfdocument:xref found: pos=b'774066'
INFO:pdfminer.pdfdocument:read_xref_from: start=774066, token=/b'xref'
INFO:pdfminer.pdfdocument:xref objects: {2: (None, 9, 0), 3: (None, 400798, 0), 4: (None, 400895, 0), 5: (None, 773855, 0), 6: (None, 401082, 0), 7: (None, 773571, 0), 8: (None, 773668, 0), 9: (None, 773919, 0), 10: (None, 773970, 0)}
INFO:pdfminer.pdfdocument:trailer: {'Size': 10, 'Root': <PDFObjRef:8>, 'Info': <PDFObjRef:9>}
INFO:pdfminer.pdfdocument:trailer: {'Size': 10, 'Root': <PDFObjRef:8>, 'Info': <PDFObjRef:9>}
Traceback (most recent call last):
  File "/usr/local/bin/pdf2txt.py", line 129, in <module>
    if __name__ == '__main__': sys.exit(main())
  File "/usr/local/bin/pdf2txt.py", line 124, in main
    outfp = extract_text(**vars(A))
  File "/usr/local/bin/pdf2txt.py", line 64, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/usr/local/lib/python3.6/site-packages/pdfminer/high_level.py", line 81, in extract_text_to_fp
    check_extractable=True):
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 121, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 579, in __init__
    self.info.append(dict_value(trailer['Info']))
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 164, in dict_value
    x = resolve1(x)
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 84, in resolve1
    x = x.resolve(default=default)
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 71, in resolve
    return self.doc.getobj(self.objid)
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 689, in getobj
    obj = self._getobj_parse(index, objid)
  File "/usr/local/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 655, in _getobj_parse
    while kwd is not self.KEYWORD_OBJ:
AttributeError: 'PDFDocument' object has no attribute 'KEYWORD_OBJ'
goulu added a commit that referenced this issue Jul 20, 2017
@goulu
Copy link
Member

goulu commented Jul 20, 2017

Solved in b010db6
(sorry, it was a stupid mistake, thanks for reporting it)

@goulu goulu closed this as completed Jul 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants