You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[herrold@localhost prices]$ /usr/bin/latin2ascii Murray93DrunkAndDog.pdf
Traceback (most recent call last):
File "/usr/bin/latin2ascii", line 130, in
if name == 'main': sys.exit(main(sys.argv))
File "/usr/bin/latin2ascii", line 125, in main
for line in fileinput.input(args):
File "/usr/lib64/python3.6/fileinput.py", line 250, in next
line = self._readline()
File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
return self._readline()
File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal not in range(128)
[herrold@localhost prices]$
ask if you need more information to reproduce, but this should suffice
The latin2ascii.py file did not have any meaningful changes since 2010. I guess it is outdated and there are (much) better ways to achieve what you want. What is it that you want to do?
For you information; the latin2ascii command does not extract the content of the pdf. It just prints the all the bytes from the file (in your case the pdf) in ascii notation.
Describe the bug
The decoder hits a character it cannot decode and segfaults, rather than gracefully erroring
To Reproduce
target file creating the issue is at:
http://gallery.herrold.com/stuff/Murray93DrunkAndDog.pdf
CentOS 7 with EPEL
[herrold@localhost prices]$ rpm -V python36-pdfminer
[herrold@localhost prices]$ rpm -q python36-pdfminer
python36-pdfminer-20160614-5.el7.noarch
[herrold@localhost prices]$ /usr/bin/latin2ascii Murray93DrunkAndDog.pdf
Traceback (most recent call last):
File "/usr/bin/latin2ascii", line 130, in
if name == 'main': sys.exit(main(sys.argv))
File "/usr/bin/latin2ascii", line 125, in main
for line in fileinput.input(args):
File "/usr/lib64/python3.6/fileinput.py", line 250, in next
line = self._readline()
File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
return self._readline()
File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal not in range(128)
[herrold@localhost prices]$
ask if you need more information to reproduce, but this should suffice
Murray93DrunkAndDog.pdf
The text was updated successfully, but these errors were encountered: