Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Expand file size explanations #1835

Merged
merged 9 commits into from
May 20, 2023
Merged

DOC: Expand file size explanations #1835

merged 9 commits into from
May 20, 2023

Conversation

DIvkov575
Copy link
Contributor

@DIvkov575 DIvkov575 commented May 7, 2023

Closes #1786

Changing the viewboxes ("cropping") has no impact on file size
Removing complete pages only has an impact if the connected resources are also removed

@DIvkov575
Copy link
Contributor Author

DIvkov575 commented May 7, 2023

I included a header about removing sources
however, I had some questions:

  • is there a way to specify a source to be removed in pypdf (I didn't see anything in the docs)
  • inorder to address cropping, should I just create a new header: "don't crop" . . . to reduce filesize

@pubpub-zz ?

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented May 7, 2023

I included a header about removing sources however, I had some questions:

* is there a way to specify a source to be removed in pypdf (I didn't see anything in the docs)

any object can be used many times in the same document so it is hard to decide to remove an object at the beginning.
To remove an image or other internal object, you will have to delete the reference (IndirectObject) pointing to it where pertinent and for image you will have to remote the "code" to remove the "Do" operation.
What you may need to do is to write the pdf file (even in a ByteIO)and then reload the interim into a PdfWriter in order to not load orphans objects.

For full pages, I would more likely recommend to not insert them, providing a list of pages numbers or PageObjects that will not include it.

* in order to address cropping, should I just create a new header: "don't crop" . . . to reduce filesize

Some comments in the good sections may be sufficient

The most important is to indicate that cropping only makes not visible outside but they are still present (for file size but also for text extraction)

@DIvkov575
Copy link
Contributor Author

@pubpub-zz
Is there anything else I should include?

@DIvkov575
Copy link
Contributor Author

@pub-zz Is this good to merge? (I don't have access)

Copy link
Collaborator

@pubpub-zz pubpub-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are talking about page deletions which is not currently implemented. I'm preparing a page to add that. This PR should wait for this new feature to be added.
Update : PR #1843 has been generated, pending merging

docs/user/file-size.md Outdated Show resolved Hide resolved
docs/user/file-size.md Outdated Show resolved Hide resolved
DIvkov575 and others added 2 commits May 16, 2023 14:57
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
@MartinThoma MartinThoma changed the title updating file-size.md DOC: updating file-size.md May 18, 2023
docs/user/file-size.md Outdated Show resolved Hide resolved
docs/user/file-size.md Outdated Show resolved Hide resolved
docs/user/file-size.md Outdated Show resolved Hide resolved
@MartinThoma MartinThoma changed the title DOC: updating file-size.md DOC: Expand file size explanations May 18, 2023
DIvkov575 and others added 3 commits May 18, 2023 19:55
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
docs/user/file-size.md Outdated Show resolved Hide resolved
docs/user/file-size.md Outdated Show resolved Hide resolved
@MartinThoma MartinThoma merged commit bf56f16 into py-pdf:main May 20, 2023
@MartinThoma
Copy link
Member

Thank you for the PR @DIvkov575 :-)

I've adjusted some of the wording. By the way: https://chat.openai.com/ is pretty good in improving the language. It might be a good idea to let it improve more parts of the docs 🤔

MartinThoma added a commit that referenced this pull request May 21, 2023
New Features (ENH)
-  Simplify metadata input (Document Information Dictionary) (#1851)
-  Extend cmap compatibilty to GBK_EUC_H/V (#1812)

Bug Fixes (BUG)
-  Prevent infinite loop when no character follows after a comment (#1828)
-  get_contents does not return ContentStream (#1847)
-  Accept XYZ destination with zoom missing (default to zoom=0.0) (#1844)
-  Cope with 1 Bit images (#1815)

Robustness (ROB)
-  Handle missing /Type entry in Page tree (#1845)

Documentation (DOC)
-  Expand file size explanations (#1835)
-  Add comparison with pdfplumber (#1837)
-  Clarify that PyPDF2 is dead (#1827)
-  Add Hunter King as Contributor for #1806

Maintenance (MAINT)
-  Refactor internal Encryption class (#1821)
-  Add R parameter to generate_values (#1820)
-  Make encryption_key parameter of write_to_stream optional (#1819)
-  Prepare for adding AES enryption support (#1818)

Code Style (STY):
-  Iterate directly over the list instead of using range (#1839)
-  Minor refactorings in _encryption.py (#1822)

[Full Changelog](3.8.1...3.9.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disproportionately large file sizes
3 participants