ValueError: unrecognized image mode #795

joaquimcampos · 2022-08-20T12:11:34Z

Bug report

This exception is thrown inside when saving an image in _save_bytes(image) when mode='8'.
It can be triggered with pdf2txt.py --output-dir [some_dir] [pdf_file] using the following file: pdfminer_bytes.pdf.

The issue has to do with the fact that Pillow (the updated PIL Fork) does not have a mode '8' but has instead a mode 'L'.

Traceback:

/python3.10/site-packages/pdfminer/image.py:126 in       
│ export_image                                                                                    
│                                                                                                 
│   123 │   │   │   name = self._save_bmp(image, width, height, width, image.bits)                
│   124 │   │                                                                                      
│   125 │   │   elif len(filters) == 1 and filters[0][0] in LITERALS_FLATE_DECODE:                
│ ❱ 126 │   │   │   name = self._save_bytes(image)                                                 
│   127 │   │                                                                                      
│   128 │   │   else:                                                                              
│   129 │   │   │   name = self._save_raw(image)                                                   
│                                                                                                  
│ /python3.10/site-packages/pdfminer//python3.10/site-packages/pdfminer/image.py:241 in      
│ _save_bytes                                                                                      
│                                                                                                 
│   238 │   │   │   elif image.bits == 8 and channels == 4:                                       
│   239 │   │   │   │   mode = "CMYK"                                                             
│   240 │   │   │                                                                                 
│ ❱ 241 │   │   │   img = Image.frombytes(mode, image.srcsize, image.stream.get_data(), "raw")    
│   242 │   │   │   img.save(fp)                                                                  
│   243 │   │                                                                                     
│   244 │   │   return name                                                                       
│                                                                                                 
│ /python3.10/site-packages/pdfminer//python3.10/site-packages/PIL/Image.py:2842 in frombytes 
│                                                                                                  
│   2839 │   if decoder_name == "raw" and args == ():                                             
│   2840 │   │   args = mode                                                                      
│   2841 │                                                                                        
│ ❱ 2842 │   im = new(mode, size)                                                                 
│   2843 │   im.frombytes(data, decoder_name, args)                                               
│   2844 │   return im                                                                            
│   2845                                                                                          
│                                                                                                 
│/python3.10/site-packages/pdfminer//python3.10/site-packages/PIL/Image.py:2806 in new       
│                                                                                                 
│   2803 │   │                                                                                    
│   2804 │   │   im.palette = ImagePalette.ImagePalette()                                         
│   2805 │   │   color = im.palette.getcolor(color)                                               
│ ❱ 2806 │   return im._new(core.fill(mode, size, color))                                        
│   2807                                                                                          
│   2808                                                                                          
│   2809 def frombytes(mode, size, data, decoder_name="raw", *args):

Here is a commit that solves the issue:
master...joaquimcampos:pdfminer.six:master

The text was updated successfully, but these errors were encountered:

pietermarsman · 2022-08-22T06:53:36Z

I cannot reproduce this bug.

$ python tools/pdf2txt.py ~/Downloads/pdfminer_bytes.pdf 
1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ

Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 

0,99365 = 00,03

Καλύτεροι κατά 1% καθημερινά επί ένα χρόνο. 

1,01365 = 37,78

ΑΠΟΤΕΛΕΣΜΑΤΑ

1% ΒΕΛΤΙΩΣΗ

1% ΕΠΙΔΕΙΝΩΣΗ

ΧΡΟΝΟΣ

ΕΙΚΟΝΑ  1:  Τα  αποτελέσματα  των  μικρών  συνηθειών  πολλα-
πλασιάζονται με το πέρασμα του χρόνου. Για παράδειγμα, αν 
βελτιώνεστε κατά 1% καθημερινά επί ένα χρόνο, στο τέλος του 
χρόνου θα καταλήξετε με ένα αποτέλεσμα 37 φορές καλύτερο.

Οι  συνήθειες  είναι  ο  ανατοκισμός  της  αυτοβελτίωσης  (17). 
Με τον ίδιο τρόπο που τα χρήματα πολλαπλασιάζονται μέσω του 
ανατοκισμού,  πολλαπλασιάζονται  και  οι  επιδράσεις  των  συνη-
θειών σας όσο τις επαναλαμβάνετε τακτικά. Μολονότι η διαφορά 

3434

Ένα τίποτα μπορεί ν’ αλλάξει τα πάντα

What version of pdfminer.six are you using?

Some work on this was done in #737, which was released in 20220506, maybe you are using an older version?

joaquimcampos · 2022-08-22T09:48:36Z

You didn't use the right command, @pietermarsman. You need to add --output-dir [some_dir] to extract the images.

KunalGehlot · 2022-08-23T07:08:22Z

I confirmed and was able to replicate the issue.

(base) kunal@Kunals-MacBook-Pro pdfminer_tests % pdf2txt.py --output-dir . pdfminer_bytes.pdf 
1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ

Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 

0,99365 = 00,03

Καλύτεροι κατά 1% καθημερινά επί ένα χρόνο. 

1,01365 = 37,78

ΑΠΟΤΕΛΕΣΜΑΤΑ

1% ΒΕΛΤΙΩΣΗ

1% ΕΠΙΔΕΙΝΩΣΗ

ΧΡΟΝΟΣ

ΕΙΚΟΝΑ  1:  Τα  αποτελέσματα  των  μικρών  συνηθειών  πολλα-
πλασιάζονται με το πέρασμα του χρόνου. Για παράδειγμα, αν 
βελτιώνεστε κατά 1% καθημερινά επί ένα χρόνο, στο τέλος του 
χρόνου θα καταλήξετε με ένα αποτέλεσμα 37 φορές καλύτερο.

Οι  συνήθειες  είναι  ο  ανατοκισμός  της  αυτοβελτίωσης  (17). 
Με τον ίδιο τρόπο που τα χρήματα πολλαπλασιάζονται μέσω του 
ανατοκισμού,  πολλαπλασιάζονται  και  οι  επιδράσεις  των  συνη-
θειών σας όσο τις επαναλαμβάνετε τακτικά. Μολονότι η διαφορά 

3434

Ένα τίποτα μπορεί ν’ αλλάξει τα πάνταTraceback (most recent call last):
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 313, in <module>
    sys.exit(main())
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 307, in main
    outfp = extract_text(**vars(parsed_args))
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 62, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/high_level.py", line 121, in extract_text_to_fp
    interpreter.process_page(page)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 992, in process_page
    self.device.end_page(page)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 80, in end_page
    self.receive_layout(self.cur_item)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 331, in receive_layout
    render(ltpage)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 320, in render
    render(child)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 320, in render
    render(child)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 327, in render
    self.imagewriter.export_image(item)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/image.py", line 125, in export_image
    name = self._save_bytes(image)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/image.py", line 240, in _save_bytes
    img = Image.frombytes(mode, image.srcsize, image.stream.get_data(), "raw")
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/PIL/Image.py", line 2706, in frombytes
    im = new(mode, size)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/PIL/Image.py", line 2670, in new
    return im._new(core.fill(mode, size, color))
ValueError: unrecognized image mode

The proposed commit does fix the issue, which I ran locally and produced the following image:

I will be creating a PR after running the tests.
Thank you @joaquimcampos

pietermarsman · 2022-10-15T07:31:07Z

@joaquimcampos Thanks for pointing that out 👍

This reverts commit cac6217.

* Fix #795 * Documentation updates (FAQ and others) * New how-to for extracting coordinates * Indent fix in documentation * Revert "Fix #795" This reverts commit cac6217. * Move description of iterating LTPage to the docstring of LTPage * Remove adding how-to for extracting coordinates from this pr * Add CHANGELOG.md * Remove FAQ from this branch * Only add one line to CHANGELOG.md Co-authored-by: Kunal Gehlot <kunal.g@360hvpl.com>

pietermarsman added the type: bug label Aug 22, 2022

KunalGehlot pushed a commit to KunalGehlot/pdfminer.six that referenced this issue Aug 23, 2022

Fix pdfminer#795

cac6217

KunalGehlot mentioned this issue Aug 23, 2022

Fix #795 #798

Closed

5 tasks

pietermarsman added the status: accepted label Oct 15, 2022

pietermarsman pushed a commit that referenced this issue Nov 5, 2022

Fix #795

790ed3e

pietermarsman mentioned this issue Nov 5, 2022

fix-795 #827

Merged

5 tasks

pietermarsman closed this as completed in fa71062 Nov 5, 2022

pietermarsman added a commit that referenced this issue Nov 5, 2022

Revert "Fix #795"

f661b59

This reverts commit cac6217.

pietermarsman mentioned this issue Nov 5, 2022

Update documentation #828

Merged

5 tasks

MartinThoma mentioned this issue Dec 21, 2023

BUG: Handle IndirectObject as image filter py-pdf/pypdf#2355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: unrecognized image mode #795

ValueError: unrecognized image mode #795

joaquimcampos commented Aug 20, 2022 •

edited

Loading

pietermarsman commented Aug 22, 2022

joaquimcampos commented Aug 22, 2022

KunalGehlot commented Aug 23, 2022

pietermarsman commented Oct 15, 2022

ValueError: unrecognized image mode #795

ValueError: unrecognized image mode #795

Comments

joaquimcampos commented Aug 20, 2022 • edited Loading

pietermarsman commented Aug 22, 2022

joaquimcampos commented Aug 22, 2022

KunalGehlot commented Aug 23, 2022

pietermarsman commented Oct 15, 2022

joaquimcampos commented Aug 20, 2022 •

edited

Loading