Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: unrecognized image mode #795

Closed
joaquimcampos opened this issue Aug 20, 2022 · 4 comments
Closed

ValueError: unrecognized image mode #795

joaquimcampos opened this issue Aug 20, 2022 · 4 comments

Comments

@joaquimcampos
Copy link

joaquimcampos commented Aug 20, 2022

Bug report

This exception is thrown inside when saving an image in _save_bytes(image) when mode='8'.
It can be triggered with pdf2txt.py --output-dir [some_dir] [pdf_file] using the following file: pdfminer_bytes.pdf.

The issue has to do with the fact that Pillow (the updated PIL Fork) does not have a mode '8' but has instead a mode 'L'.

Traceback:

/python3.10/site-packages/pdfminer/image.py:126 in       
│ export_image                                                                                    
│                                                                                                 
│   123 │   │   │   name = self._save_bmp(image, width, height, width, image.bits)                
│   124 │   │                                                                                      
│   125 │   │   elif len(filters) == 1 and filters[0][0] in LITERALS_FLATE_DECODE:                
│ ❱ 126 │   │   │   name = self._save_bytes(image)                                                 
│   127 │   │                                                                                      
│   128 │   │   else:                                                                              
│   129 │   │   │   name = self._save_raw(image)                                                   
│                                                                                                  
│ /python3.10/site-packages/pdfminer//python3.10/site-packages/pdfminer/image.py:241 in      
│ _save_bytes                                                                                      
│                                                                                                 
│   238 │   │   │   elif image.bits == 8 and channels == 4:                                       
│   239 │   │   │   │   mode = "CMYK"                                                             
│   240 │   │   │                                                                                 
│ ❱ 241 │   │   │   img = Image.frombytes(mode, image.srcsize, image.stream.get_data(), "raw")    
│   242 │   │   │   img.save(fp)                                                                  
│   243 │   │                                                                                     
│   244 │   │   return name                                                                       
│                                                                                                 
│ /python3.10/site-packages/pdfminer//python3.10/site-packages/PIL/Image.py:2842 in frombytes 
│                                                                                                  
│   2839 │   if decoder_name == "raw" and args == ():                                             
│   2840 │   │   args = mode                                                                      
│   2841 │                                                                                        
│ ❱ 2842 │   im = new(mode, size)                                                                 
│   2843 │   im.frombytes(data, decoder_name, args)                                               
│   2844 │   return im                                                                            
│   2845                                                                                          
│                                                                                                 
│/python3.10/site-packages/pdfminer//python3.10/site-packages/PIL/Image.py:2806 in new       
│                                                                                                 
│   2803 │   │                                                                                    
│   2804 │   │   im.palette = ImagePalette.ImagePalette()                                         
│   2805 │   │   color = im.palette.getcolor(color)                                               
│ ❱ 2806 │   return im._new(core.fill(mode, size, color))                                        
│   2807                                                                                          
│   2808                                                                                          
│   2809 def frombytes(mode, size, data, decoder_name="raw", *args):  

Here is a commit that solves the issue:
master...joaquimcampos:pdfminer.six:master

@pietermarsman
Copy link
Member

I cannot reproduce this bug.

$ python tools/pdf2txt.py ~/Downloads/pdfminer_bytes.pdf 
1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ

Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 

0,99365 = 00,03

Καλύτεροι κατά 1% καθημερινά επί ένα χρόνο. 

1,01365 = 37,78

ΑΠΟΤΕΛΕΣΜΑΤΑ

1% ΒΕΛΤΙΩΣΗ

1% ΕΠΙΔΕΙΝΩΣΗ

ΧΡΟΝΟΣ

ΕΙΚΟΝΑ  1:  Τα  αποτελέσματα  των  μικρών  συνηθειών  πολλα-
πλασιάζονται με το πέρασμα του χρόνου. Για παράδειγμα, αν 
βελτιώνεστε κατά 1% καθημερινά επί ένα χρόνο, στο τέλος του 
χρόνου θα καταλήξετε με ένα αποτέλεσμα 37 φορές καλύτερο.

Οι  συνήθειες  είναι  ο  ανατοκισμός  της  αυτοβελτίωσης  (17). 
Με τον ίδιο τρόπο που τα χρήματα πολλαπλασιάζονται μέσω του 
ανατοκισμού,  πολλαπλασιάζονται  και  οι  επιδράσεις  των  συνη-
θειών σας όσο τις επαναλαμβάνετε τακτικά. Μολονότι η διαφορά 

3434

Ένα τίποτα μπορεί ν’ αλλάξει τα πάντα

What version of pdfminer.six are you using?

Some work on this was done in #737, which was released in 20220506, maybe you are using an older version?

@joaquimcampos
Copy link
Author

You didn't use the right command, @pietermarsman. You need to add --output-dir [some_dir] to extract the images.

@KunalGehlot
Copy link
Contributor

I confirmed and was able to replicate the issue.

(base) kunal@Kunals-MacBook-Pro pdfminer_tests % pdf2txt.py --output-dir . pdfminer_bytes.pdf 
1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ

Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 

0,99365 = 00,03

Καλύτεροι κατά 1% καθημερινά επί ένα χρόνο. 

1,01365 = 37,78

ΑΠΟΤΕΛΕΣΜΑΤΑ

1% ΒΕΛΤΙΩΣΗ

1% ΕΠΙΔΕΙΝΩΣΗ

ΧΡΟΝΟΣ

ΕΙΚΟΝΑ  1:  Τα  αποτελέσματα  των  μικρών  συνηθειών  πολλα-
πλασιάζονται με το πέρασμα του χρόνου. Για παράδειγμα, αν 
βελτιώνεστε κατά 1% καθημερινά επί ένα χρόνο, στο τέλος του 
χρόνου θα καταλήξετε με ένα αποτέλεσμα 37 φορές καλύτερο.

Οι  συνήθειες  είναι  ο  ανατοκισμός  της  αυτοβελτίωσης  (17). 
Με τον ίδιο τρόπο που τα χρήματα πολλαπλασιάζονται μέσω του 
ανατοκισμού,  πολλαπλασιάζονται  και  οι  επιδράσεις  των  συνη-
θειών σας όσο τις επαναλαμβάνετε τακτικά. Μολονότι η διαφορά 

3434

Ένα τίποτα μπορεί ν’ αλλάξει τα πάνταTraceback (most recent call last):
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 313, in <module>
    sys.exit(main())
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 307, in main
    outfp = extract_text(**vars(parsed_args))
  File "/Users/kunal/opt/anaconda3/bin/pdf2txt.py", line 62, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/high_level.py", line 121, in extract_text_to_fp
    interpreter.process_page(page)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 992, in process_page
    self.device.end_page(page)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 80, in end_page
    self.receive_layout(self.cur_item)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 331, in receive_layout
    render(ltpage)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 320, in render
    render(child)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 320, in render
    render(child)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/converter.py", line 327, in render
    self.imagewriter.export_image(item)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/image.py", line 125, in export_image
    name = self._save_bytes(image)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/pdfminer/image.py", line 240, in _save_bytes
    img = Image.frombytes(mode, image.srcsize, image.stream.get_data(), "raw")
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/PIL/Image.py", line 2706, in frombytes
    im = new(mode, size)
  File "/Users/kunal/opt/anaconda3/lib/python3.9/site-packages/PIL/Image.py", line 2670, in new
    return im._new(core.fill(mode, size, color))
ValueError: unrecognized image mode

The proposed commit does fix the issue, which I ran locally and produced the following image:
Im0 1

I will be creating a PR after running the tests.
Thank you @joaquimcampos

KunalGehlot pushed a commit to KunalGehlot/pdfminer.six that referenced this issue Aug 23, 2022
@KunalGehlot KunalGehlot mentioned this issue Aug 23, 2022
5 tasks
@pietermarsman
Copy link
Member

@joaquimcampos Thanks for pointing that out 👍

pietermarsman pushed a commit that referenced this issue Nov 5, 2022
@pietermarsman pietermarsman mentioned this issue Nov 5, 2022
5 tasks
pietermarsman added a commit that referenced this issue Nov 5, 2022
This reverts commit cac6217.
@pietermarsman pietermarsman mentioned this issue Nov 5, 2022
5 tasks
pietermarsman added a commit that referenced this issue Nov 5, 2022
* Fix #795

* Documentation updates (FAQ and others)

* New how-to for extracting coordinates

* Indent fix in documentation

* Revert "Fix #795"

This reverts commit cac6217.

* Move description of iterating LTPage to the docstring of LTPage

* Remove adding how-to for extracting coordinates from this pr

* Add CHANGELOG.md

* Remove FAQ from this branch

* Only add one line to CHANGELOG.md

Co-authored-by: Kunal Gehlot <kunal.g@360hvpl.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants