Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add comparison with pdfplumber #1837

Merged
merged 6 commits into from
May 20, 2023
Merged

DOC: Add comparison with pdfplumber #1837

merged 6 commits into from
May 20, 2023

Conversation

RitchieP
Copy link
Contributor

@RitchieP RitchieP commented May 9, 2023

Added my take on the pdfplumber library compared to PyPDF.

Added my take on the `pdfplumber` library compared to PyPDF.
@MartinThoma MartinThoma changed the title Update comparisons.md with pdfplumber DOC: Add comparison with pdfplumber May 18, 2023
@MartinThoma MartinThoma added the nf-documentation Non-functional change: Documentation label May 18, 2023
@MartinThoma
Copy link
Member

I like it! I have to suggestions. What do you think about those?

@RitchieP
Copy link
Contributor Author

Yeah, I think it would work also, since pdfplumber is built on top of pdfminer.six

RitchieP and others added 2 commits May 19, 2023 09:21
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
@MartinThoma MartinThoma merged commit e4ef5b9 into py-pdf:main May 20, 2023
MartinThoma added a commit that referenced this pull request May 21, 2023
New Features (ENH)
-  Simplify metadata input (Document Information Dictionary) (#1851)
-  Extend cmap compatibilty to GBK_EUC_H/V (#1812)

Bug Fixes (BUG)
-  Prevent infinite loop when no character follows after a comment (#1828)
-  get_contents does not return ContentStream (#1847)
-  Accept XYZ destination with zoom missing (default to zoom=0.0) (#1844)
-  Cope with 1 Bit images (#1815)

Robustness (ROB)
-  Handle missing /Type entry in Page tree (#1845)

Documentation (DOC)
-  Expand file size explanations (#1835)
-  Add comparison with pdfplumber (#1837)
-  Clarify that PyPDF2 is dead (#1827)
-  Add Hunter King as Contributor for #1806

Maintenance (MAINT)
-  Refactor internal Encryption class (#1821)
-  Add R parameter to generate_values (#1820)
-  Make encryption_key parameter of write_to_stream optional (#1819)
-  Prepare for adding AES enryption support (#1818)

Code Style (STY):
-  Iterate directly over the list instead of using range (#1839)
-  Minor refactorings in _encryption.py (#1822)

[Full Changelog](3.8.1...3.9.0)

[`pdfminer.six`](https://pypi.org/project/pdfminer.six/) is capable of
extracting the [font size](https://stackoverflow.com/a/69962459/562769)
/ font weight (bold-ness). It has no capabilities for writing PDF files.

## pdfrw / pdfminer / pdfplumber
[`pdfplumber`](https://pypi.org/project/pdfplumber/) is a library focused on extracting data from PDF documents. Since `pdfplumber` is built on top of `pdfminer.six`, there are **no capabilities of exporting or modifying a PDF file** (see [#440 (discussions)](https://github.com/jsvine/pdfplumber/discussions/440#discussioncomment-803880)). However, `pdfplumber` is capable of converting a PDF file into an image, [draw lines and rectangles on the image](https://github.com/jsvine/pdfplumber#drawing-methods), and save it as an image file.
Copy link

@mara004 mara004 May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is capable of converting a PDF file into an image

From skimming the Readme, it looks like pdfplumber calls Wand for pdf rendering, which is a binding to ImageMagick, which in turn uses ghostscript, IIRC.
So this phrase is kinda misleading as pdfplumber is not an actual pdf rendering library (as opposed to mupdf/poppler/pdfium), but merely a rendering "wrapper-wrapper-wrapper".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree! It is not a PDF rendering library, there's just one function to convert the PDF into an image with the tools you mentioned. I'm not experienced with Wand, ImageMagick, and ghostscript, so if you're an expert there, feel free to elaborate more on my changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RitchieP You could rephrase

However, pdfplumber is capable of converting a PDF file into an image

to

However, pdfplumber is capable of converting a PDF file into an image via ImageMagick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! I'll make a PR in a bit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nf-documentation Non-functional change: Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants