-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging of rotated pages leads to unexpected results #1280
Comments
When you check all the entries in the page, you have: |
Is there a way to "rotate" it correctly? Because If I merge this PDF with my stamp, it will be misplaced and rotated. But still, the PDF itself is looking as it should, portrait orientation, no any rotation |
From my point of view you issue is with the stamp file. |
@Nikita7x Thank you for reporting this issue 🙏 I took the freedom to adjust your code so that it uses the new syntax. You might want to check the deprecation warnings ( I've also adjusted the title of the issue as I think it describes the issue better. Do you think so as well or did I get something wrong? If the new title is ok, maybe you can adjust the example code (I guess your're using something similar to https://pypdf2.readthedocs.io/en/latest/user/add-watermark.html ) I fully understand why the current behavior of PyPDF2 is not as expected. At the moment, PyPDF2 is essentially doing the simplest way. I think we could (and likely should) make PyPDF2 behave as you expect, but we need to make sure that we don't break existing code if we adjust it. |
By the way: I love your schematic image for the current stamp behavior 😍 Did you create them yourself? May we use them in the official docs? |
@pubpub-zz We could add an |
I thought so, but it turns out, it's not about the stamp, it's about the first document. What I've tried also is:
|
As far as I understand the origin of these files: they are being put into a scanner, but with wrong rotation. If you had an experience scanning documents, then you probably remember the guides on the scanning plate, that asks to lay paper vertically. So, some people lay paper down horizontally and after the scanning procedure it appears to be landscape. So they rotate it using software. And after that PDF “looks” normal: vertical (portrait) orientation, but on deeper lever it is still a landscape one. And when I use PyPDF2, it grabs information not from “upper level” of this document, but from “deeper level”, what causes these problems. |
Yeah, I made it myself in Sketch. Of course, feel free to use it everywhere |
@MartinThoma Apparently the business I'm working at really relies on PyPDF2 and stamping ability. Maybe I can help somehow to resolve this issue? What can I do so that stamp will be applied in place in any document "rotation"? @pubpub-zz mentioned that
So, what can I do with this Rotate parameter? It seems like applying |
@Nikita7x Would you mind sharing the "Rotated PDF" (without the text) and the "Stamp" image in a bit higher resolution with me? I will create some examples + add them to the docs. I also get confused by this and I need to give it a try. But it's important to note that PDF has two mechanisms for rotating stuff: https://pypdf2.readthedocs.io/en/latest/user/cropping-and-transforming.html#page-rotation |
@MartinThoma to make a "Rotated PDF" I simply use |
I was talking about the sketch images :-) You could share them just as PNG; I can convert them to PDF and show the effects |
Nice! I think I have an issue related to this topic. So far, I understood that when I use |
It seems as if there is a bug and thus the solution I proposed doesn't work: #1301 |
Thank you! I wish I could contribute to solve the issue, but my python knowledge is not enough so far. But I really appreciate the effort you and all the contributors are putting in this project. |
@MartinThoma how's it going? Do you know maybe how can I fix this issue myself? My work really relies on that functionality. Maybe I need to rotate it differently somehow? |
@Nikita7x |
@Nikita7x I'm out of ideas for this one (and currently I only have the energy for simple changes in PyPDF2; this one needs more than I can give at the moment) But you're lucky - @pubpub-zz is the top contributor to PyPDF2 :-) |
Error detected during analysis of py-pdf#1280
this are the test files I've used: Page to be stamped the use of
with the following code, I get: import PyPDF2
reader = PyPDF2.PdfReader("e:/Downloads/MyStampFin.pdf")
st = r.pages[0]
writer = PyPDF2.PdfWriter()
org = PyPDF2.PdfReader("e:/Downloads/0004.pdf")
page = org.pages[0]
st.add_transformation(
PyPDF2.Transformation()
.translate(-float(st.mediabox.getWidth()) / 2, -float(st.mediabox.getHeight()) / 2)
.rotate(90)
.translate(float(st.mediabox.getHeight()) / 2, float(st.mediabox.getWidth()) / 2),
True,
)
page.merge_page(st)
writer.add_page(page)
with open("e:/Downloads/MyStamp0a.pdf", "wb") as fo:
writer.write(fo) This can also be applied with the stamp with /Rotate=90 (note the rotation with is reversed: import PyPDF2
reader = PyPDF2.PdfReader("e:/Downloads/MyStampFin90a.pdf")
st = r.pages[0]
writer = PyPDF2.PdfWriter()
org = PyPDF2.PdfReader("e:/Downloads/998719.pdf")
page = org.pages[0]
st.add_transformation(
PyPDF2.Transformation()
.translate(-float(st.mediabox.getWidth()) / 2, -float(st.mediabox.getHeight()) / 2)
.rotate(-90)
.translate(float(st.mediabox.getHeight()) / 2, float(st.mediabox.getWidth()) / 2),
True,
)
page.merge_page(st)
writer.add_page(page)
with open("e:/Downloads/MyStamp90a.pdf", "wb") as fo:
writer.write(fo) So @Nikita7x you should be able to get a quick solution @MartinThoma, @MasterOdin Your opinion |
Error detected during analysis of #1280
@pubpub-zz thanks a lot for your effort! I'll try this on my examples and see if that will fix it |
First of all: VERY nice analysis and explanation @pubpub-zz ! Thank you so much for that 🙏
I would prefer a
That is interesting! I was just wondering if there are other "standardizations" that people typically want to do. That function would return Sounds reasonable to me 👍 |
I then propose to use rotation with a getter and a setter .
Morelickely returning self to allow cascading |
Wow! This is going, people. Thank you so much for your time on this problem. The last modification almost solved my issue. I did this based on @pubpub-zz example: import PyPDF2
a4_file = PyPDF2.PdfReader('a4_file.pdf')
a4_page = a4_file.pages[0]
blue_file = PyPDF2.PdfReader('blue_file.pdf')
blue_page = blue_file.pages[0]
pink_file = PyPDF2.PdfReader('pink_file.pdf')
pink_page = pink_file.pages[0]
pink_page.add_transformation(
PyPDF2.Transformation()
.translate(-float(pink_page.mediabox.getWidth()) / 2, -float(pink_page.mediabox.getHeight()) / 2)
.rotate(90)
.translate(float(pink_page.mediabox.getHeight()) / 2, float(pink_page.mediabox.getWidth()) / 2)
.translate(float(blue_page.mediabox.getWidth()), 0),
True,
)
a4_page.merge_page(blue_page)
a4_page.merge_page(pink_page)
writer = PyPDF2.PdfWriter()
writer.add_page(a4_page)
with open('result.pdf', 'wb') as final:
writer.write(final) Considering I have these files: This is the result I'd like to get: Insted, this is the output I'm getting with the code above: Upon further inspection on several PDF editig software, I can see the transformation is happening, but somehow there is a clipping mask on the At this point, I feel like I'm asking too much. Sorry for that. But, if somehow, someone could take some time on this issue, it would be much appreciated. Anyway, thank you all for your time on this package. It is awesome, and I love it! |
People, I think I found a solution! 😄 At least now I know what the issue is: when transformations are applied, the page boxes Here is an example where I successfully centered an A4 page inside an A3 page on the X axis while keeping it aligned on the botton. Code: from PyPDF2 import PdfReader, Transformation, PdfWriter
from PyPDF2.generic import RectangleObject
a3_file = PdfReader('A3.pdf')
a3_page = a3_file.pages[0]
a4_file = PdfReader('A4.pdf')
a4_page = a4_file.pages[0]
# Rotate 'a4_page.pdf' 90 degrees and centers it on 'a3_page.pdf':
a4_page.add_transformation(
Transformation()
.translate(-float(a4_page.mediaBox.getWidth()) / 2, -float(a4_page.mediaBox.getHeight()) / 2)
.rotate(90)
.translate(float(a4_page.mediaBox.getHeight()) / 2, float(a3_page.mediaBox.getLowerLeft_y() + a4_page.mediaBox.getWidth() / 2))
)
# Manually update page boxes:
new_width = a4_page.mediaBox.getHeight()
new_heigth = a4_page.mediaBox.getWidth()
new_y = a3_page.mediaBox.getLowerLeft_y()
a4_page.update({'/ArtBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/BleedBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/CropBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/MediaBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/TrimBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a3_page.merge_page(a4_page)
writer = PdfWriter()
writer.add_page(a3_page)
with open('merged_a3.pdf', 'wb') as fo:
writer.write(fo) |
refer to py-pdf#1280 for history
@pubpub-zz thank you! Your solution with rotating the stamp file works perfectly! |
@Nikita7x |
See #1280 for the context of this change
I updated PyPDF2 to a newer version which supports method
The exact file: rot_file.pdf |
@Nikita7x , |
I've checked and the version is 2.10.9, Python 3.9. |
I've noticed I can't reproduce this problem while debugging. But using Django trace I was able to get local variables.
|
Maybe this is some kind of a system bug, because this Exception raises sometimes while running the same code and the same file. I've tried rebooting the system, but it didn't help, it's really strange. Django Version: 4.1.1 |
Understood, will come back with a fix tonight |
@Nikita7x,
|
Yeah, it seems like it resolves the issue! |
PR delivered and issued |
When I try to get width and height of the page, I encounter a problem: width and height are misplaced in SOME (but not all!) of the PDFs. The page looks like completely normal A4 portrait document, but using
reader.pages[0].mediabox
returns wrong dimensions[0, 0, 841.68, 595.2]
instead of[0, 0, 595.2, 841.68]
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
dim.pdf
I removed sensitive data.
Output
The text was updated successfully, but these errors were encountered: