You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
Could someone point me to a work around? Thank you!!
Traceback (most recent call last):
File "/Users/JB1/anaconda/bin/pdf2txt.py", line 115, in
if name == 'main': sys.exit(main(sys.argv))
File "/Users/JB1/anaconda/bin/pdf2txt.py", line 109, in main
interpreter.process_page(page)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 833, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 844, in render_contents
self.init_resources(resources)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 348, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 196, in get_font
font = self.get_font(None, subspec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 187, in get_font
font = PDFCIDFont(self, spec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdffont.py", line 668, in init
self.unicode_map = CMapDB.get_unicode_map(self.cidcoding, self.cmap.is_vertical())
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 276, in get_unicode_map
data = klass._load_data('to-unicode-%s' % name)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 247, in _load_data
if os.path.exists(path):
File "/Users/JB1/anaconda/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: must be encoded string without NULL bytes, not str
The text was updated successfully, but these errors were encountered:
@JSB97 I have also encountered the same error. The problematic snippet in cmapdb.py seems to be -
def _load_data(klass, name):
filename = '%s.pickle.gz' % name
if klass.debug:
print >>sys.stderr, 'loading:', name
cmap_paths = (os.environ.get('CMAP_PATH', '/usr/share/pdfminer/'),
os.path.join(os.path.dirname(__file__), 'cmap'),)
for directory in cmap_paths:
path = os.path.join(directory, filename)
Printing the variable "filename" gives me - to-unicode-PDFXC30-Identity.pickle.gz
Printing "repr(filename)" yields - 'to-unicode-PDFXC30-Identity\x00\x00.pickle.gz'
Apparently, these \x00 characters are causing the issue. One fix that solved this issue for me was - filename = filename.replace('\0', '')
I am not sure what is causing this issue, though. @euske Is there a way to make a permanent fix for this?
I am trying to convert the following pdf to txt.
http://www.kabupro.jp/edp/20140529/S1001UPO.pdf
Using the following command
pdf2txt.py -o text.txt S1001UPO.pdf
The document is encrypted so i remove this first; however, even after doing this i get the below error.
I suspect the issue is with "TypeError: must be encoded string without NULL bytes, not str", to which this seems to offer a solution -
http://stackoverflow.com/questions/18265084/typeerror-must-be-string-without-null-bytes-not-str
Could someone point me to a work around? Thank you!!
Traceback (most recent call last):
File "/Users/JB1/anaconda/bin/pdf2txt.py", line 115, in
if name == 'main': sys.exit(main(sys.argv))
File "/Users/JB1/anaconda/bin/pdf2txt.py", line 109, in main
interpreter.process_page(page)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 833, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 844, in render_contents
self.init_resources(resources)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 348, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 196, in get_font
font = self.get_font(None, subspec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 187, in get_font
font = PDFCIDFont(self, spec)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/pdffont.py", line 668, in init
self.unicode_map = CMapDB.get_unicode_map(self.cidcoding, self.cmap.is_vertical())
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 276, in get_unicode_map
data = klass._load_data('to-unicode-%s' % name)
File "/Users/JB1/anaconda/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 247, in _load_data
if os.path.exists(path):
File "/Users/JB1/anaconda/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: must be encoded string without NULL bytes, not str
The text was updated successfully, but these errors were encountered: