Merging of rotated pages leads to unexpected results #1280

Nikita7x · 2022-08-26T09:12:21Z

When I try to get width and height of the page, I encounter a problem: width and height are misplaced in SOME (but not all!) of the PDFs. The page looks like completely normal A4 portrait document, but using reader.pages[0].mediabox returns wrong dimensions [0, 0, 841.68, 595.2] instead of [0, 0, 595.2, 841.68]

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
# macOS-12.5.1-arm64-arm-64bit
# and also on every system on docker (Ubuntu, CentOS)

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
# 2.10.3

Code + PDF

This is a minimal, complete example that shows the issue:

from PyPDF2 import PdfReader

reader = PdfReader(path)
dimensions = reader.pages[0].mediabox

width = dimensions.width
height = dimensions.height

dim.pdf

I removed sensitive data.

Output

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2022-08-26T10:40:51Z

When you check all the entries in the page, you have:
{'/Type': '/Page', '/Parent': IndirectObject(2, 0, 2357934091920), '/Resources': IndirectObject(4, 0, 2357934091920), '/Contents': IndirectObject(3, 0, 2357934091920), '/MediaBox': [0, 0, 841.68, 595.2], '/Rotate': 270}
The page is rotated at display. the MediaBox reported is correct

Nikita7x · 2022-08-26T11:37:49Z

The page is rotated at display. the MediaBox reported is correct
@pubpub-zz

Is there a way to "rotate" it correctly? Because If I merge this PDF with my stamp, it will be misplaced and rotated. But still, the PDF itself is looking as it should, portrait orientation, no any rotation

Nikita7x · 2022-08-26T11:45:39Z

Because this is what happens now with this PDF

pubpub-zz · 2022-08-27T07:45:44Z

From my point of view you issue is with the stamp file.
My first thoughts :
a) try to change your paper orientation at a different position (at 'printer' level for example)
b) create a small page just to fit the stamp with a non standard file, and then merge it using translations (https://pypdf2.readthedocs.io/en/latest/user/cropping-and-transforming.html)
c) use your existing file and find the adequate transformation before merging/rotating

MartinThoma · 2022-08-27T10:05:23Z

@Nikita7x Thank you for reporting this issue 🙏

I took the freedom to adjust your code so that it uses the new syntax. You might want to check the deprecation warnings (python -Wall yourscript.py).

I've also adjusted the title of the issue as I think it describes the issue better. Do you think so as well or did I get something wrong? If the new title is ok, maybe you can adjust the example code (I guess your're using something similar to https://pypdf2.readthedocs.io/en/latest/user/add-watermark.html )

I fully understand why the current behavior of PyPDF2 is not as expected. At the moment, PyPDF2 is essentially doing the simplest way. I think we could (and likely should) make PyPDF2 behave as you expect, but we need to make sure that we don't break existing code if we adjust it.

MartinThoma · 2022-08-27T10:06:35Z

By the way: I love your schematic image for the current stamp behavior 😍 Did you create them yourself? May we use them in the official docs?

MartinThoma · 2022-08-27T10:19:01Z

@pubpub-zz We could add an orientation property to the boxes (mediabox, cropbox, bleedbox, trimbox, artbox). That would simply be the orientation of the page itself. Adding this property would come with almost no computational / memory / maintenance cost, but it might make this error source easier to spot for users. I'm undecided what I think about the idea myself... what do you think about it?

Nikita7x · 2022-08-27T11:25:33Z

From my point of view you issue is with the stamp file.

I thought so, but it turns out, it's not about the stamp, it's about the first document. What I've tried also is:

Rotating main document (using .rotate, .rotateCounterClockwise)
Rotating the stamp file (using .rotate, .rotateCounterClockwise)
Rotating main document and the stamp file

The result is always the same.

Nikita7x · 2022-08-27T12:02:16Z

As far as I understand the origin of these files: they are being put into a scanner, but with wrong rotation. If you had an experience scanning documents, then you probably remember the guides on the scanning plate, that asks to lay paper vertically. So, some people lay paper down horizontally and after the scanning procedure it appears to be landscape. So they rotate it using software. And after that PDF “looks” normal: vertical (portrait) orientation, but on deeper lever it is still a landscape one. And when I use PyPDF2, it grabs information not from “upper level” of this document, but from “deeper level”, what causes these problems.
All of this is only my suspicion and deduction.

Nikita7x · 2022-08-27T12:04:39Z

Did you create them yourself? May we use them in the official docs?

Yeah, I made it myself in Sketch. Of course, feel free to use it everywhere

Nikita7x · 2022-08-27T18:29:46Z

@MartinThoma Apparently the business I'm working at really relies on PyPDF2 and stamping ability. Maybe I can help somehow to resolve this issue? What can I do so that stamp will be applied in place in any document "rotation"? @pubpub-zz mentioned that

'/Rotate': 270
The page is rotated at display.

So, what can I do with this Rotate parameter? It seems like applying .rotate() (.rotateClockwise) rotates it, but after .merge everything is as messed up as it was, just rotated.

MartinThoma · 2022-08-27T19:08:43Z

@Nikita7x Would you mind sharing the "Rotated PDF" (without the text) and the "Stamp" image in a bit higher resolution with me? I will create some examples + add them to the docs.

I also get confused by this and I need to give it a try. But it's important to note that PDF has two mechanisms for rotating stuff: https://pypdf2.readthedocs.io/en/latest/user/cropping-and-transforming.html#page-rotation

Nikita7x · 2022-08-27T19:48:59Z

@MartinThoma to make a "Rotated PDF" I simply use .rotate() function. So you can download original PDF and apply .rotate() function.
And here comes the Stamp, this is the exact file I use. I just removed private data
stamp.pdf

MartinThoma · 2022-08-27T20:17:15Z

And here comes the Stamp, this is the exact file I use. I just removed private data

I was talking about the sketch images :-) You could share them just as PNG; I can convert them to PDF and show the effects

Nikita7x · 2022-08-27T20:58:54Z

archive.zip

carecavoador · 2022-08-29T12:32:13Z

Nice! I think I have an issue related to this topic. So far, I understood that when I use .rotate() only the page display is rotated. That's the reason it doesn't work when merged. But I haven't figured out how to make it work.

MartinThoma · 2022-08-29T17:08:36Z

It seems as if there is a bug and thus the solution I proposed doesn't work: #1301

carecavoador · 2022-08-29T18:04:56Z

Thank you! I wish I could contribute to solve the issue, but my python knowledge is not enough so far. But I really appreciate the effort you and all the contributors are putting in this project.

Nikita7x · 2022-09-06T13:40:18Z

@MartinThoma how's it going? Do you know maybe how can I fix this issue myself? My work really relies on that functionality. Maybe I need to rotate it differently somehow?

pubpub-zz · 2022-09-08T17:03:08Z

@Nikita7x
Under Analysis

MartinThoma · 2022-09-08T18:11:47Z

@Nikita7x I'm out of ideas for this one (and currently I only have the energy for simple changes in PyPDF2; this one needs more than I can give at the moment)

But you're lucky - @pubpub-zz is the top contributor to PyPDF2 :-)

Error detected during analysis of py-pdf#1280

pubpub-zz · 2022-09-10T15:53:24Z

this are the test files I've used:
Stamp files
MyStampFin.pdf : vertical page with no /Rotate
MyStampFin90a.pdf : vertical displayed, landscape layout with /Rotate =90

Page to be stamped
0004.pdf : Layout page
998719.pdf : Vertical page

the use of page.rotate() is not adequate as the flag is not taken into account in the merging.
we have to apply transformation before merging. Many points have to be considered for that:

an error has been found in in transformation.ctm (fixed in FIX : Fix Error in transformations #1341)
be carefull about signs /Rotate is clockwise positive where as transformation are trigonometric (counterclockwise positive)
rotate transformation center is at 0,0 therefore you need to translate before and after
the (visual) origin 0,0 is normally on lower/left corner but when /Rotate exists the center moves to upper/left (Rotate=90), upper/right(Rotate =180), lower/right (Rotate=270) and be aware that the axis are turning simultaneously
when applying a rotation, the cropping area is introduced in content. To prevent that the easiest is to set expand to True

with the following code, I get:

import PyPDF2

reader = PyPDF2.PdfReader("e:/Downloads/MyStampFin.pdf")
st = r.pages[0]
writer = PyPDF2.PdfWriter()
org = PyPDF2.PdfReader("e:/Downloads/0004.pdf")
page = org.pages[0]
st.add_transformation(
    PyPDF2.Transformation()
    .translate(-float(st.mediabox.getWidth()) / 2, -float(st.mediabox.getHeight()) / 2)
    .rotate(90)
    .translate(float(st.mediabox.getHeight()) / 2, float(st.mediabox.getWidth()) / 2),
    True,
)
page.merge_page(st)
writer.add_page(page)
with open("e:/Downloads/MyStamp0a.pdf", "wb") as fo:
    writer.write(fo)

This can also be applied with the stamp with /Rotate=90 (note the rotation with is reversed:

import PyPDF2

reader = PyPDF2.PdfReader("e:/Downloads/MyStampFin90a.pdf")
st = r.pages[0]
writer = PyPDF2.PdfWriter()
org = PyPDF2.PdfReader("e:/Downloads/998719.pdf")
page = org.pages[0]
st.add_transformation(
    PyPDF2.Transformation()
    .translate(-float(st.mediabox.getWidth()) / 2, -float(st.mediabox.getHeight()) / 2)
    .rotate(-90)
    .translate(float(st.mediabox.getHeight()) / 2, float(st.mediabox.getWidth()) / 2),
    True,
)
page.merge_page(st)
writer.add_page(page)
with open("e:/Downloads/MyStamp90a.pdf", "wb") as fo:
    writer.write(fo)

So @Nikita7x you should be able to get a quick solution

@MartinThoma, @MasterOdin
In order to improve the API I propose 2 new functions:
a) add a page.get_rotate() function : to retrieve the page /Rotate value
b) add a page.transfer_rotate_to_content() function : to transfer the /Rotate flag into content (apply the good translation/rotations, change MediaBox/... and reset the translation field

Your opinion

Error detected during analysis of #1280

Nikita7x · 2022-09-13T08:02:20Z

@pubpub-zz thanks a lot for your effort! I'll try this on my examples and see if that will fix it

MartinThoma · 2022-09-14T04:33:18Z

First of all: VERY nice analysis and explanation @pubpub-zz ! Thank you so much for that 🙏

a) add a page.get_rotate() function : to retrieve the page /Rotate value

I would prefer a rotation property as this feels overall more consistent / cleaner to me. Especially if we just use the /Rotate value. We could make that property even writable.

b) add a page.transfer_rotate_to_content() function

That is interesting! I was just wondering if there are other "standardizations" that people typically want to do. That function would return None, right?

Sounds reasonable to me 👍

pubpub-zz · 2022-09-14T06:04:25Z

I would prefer a rotation property as this feels overall more consistent / cleaner to me. Especially if we just use the /Rotate value. We could make that property even writable.

I then propose to use rotation with a getter and a setter . .rotate would be possible with .rotation += 90. Just wanting to avoid slightly different wording for getting and setting.

b) add a page.transfer_rotate_to_content() function

That is interesting! I was just wondering if there are other "standardizations" that people typically want to do. That function would return None, right?

Morelickely returning self to allow cascading

carecavoador · 2022-09-14T14:42:51Z

Wow! This is going, people. Thank you so much for your time on this problem. The last modification almost solved my issue.

I did this based on @pubpub-zz example:

import PyPDF2

a4_file = PyPDF2.PdfReader('a4_file.pdf')
a4_page = a4_file.pages[0]

blue_file = PyPDF2.PdfReader('blue_file.pdf')
blue_page = blue_file.pages[0]

pink_file = PyPDF2.PdfReader('pink_file.pdf')
pink_page = pink_file.pages[0]

pink_page.add_transformation(
    PyPDF2.Transformation()
    .translate(-float(pink_page.mediabox.getWidth()) / 2, -float(pink_page.mediabox.getHeight()) / 2)
    .rotate(90)
    .translate(float(pink_page.mediabox.getHeight()) / 2, float(pink_page.mediabox.getWidth()) / 2)
    .translate(float(blue_page.mediabox.getWidth()), 0),
    True,
)
a4_page.merge_page(blue_page)
a4_page.merge_page(pink_page)

writer = PyPDF2.PdfWriter()
writer.add_page(a4_page)

with open('result.pdf', 'wb') as final:
    writer.write(final)

Considering I have these files:

This is the result I'd like to get:

Insted, this is the output I'm getting with the code above:

Upon further inspection on several PDF editig software, I can see the transformation is happening, but somehow there is a clipping mask on the pink_page that cuts it. We can observe it on the file wireframe:

At this point, I feel like I'm asking too much. Sorry for that. But, if somehow, someone could take some time on this issue, it would be much appreciated.

Anyway, thank you all for your time on this package. It is awesome, and I love it!

carecavoador · 2022-09-15T16:20:54Z

People, I think I found a solution! 😄

At least now I know what the issue is: when transformations are applied, the page boxes '/ArtBox', '/BleedBox', '/CropBox', '/MediaBox', '/TrimBox' are not being updated acordingly. So you have to do it manually.

Here is an example where I successfully centered an A4 page inside an A3 page on the X axis while keeping it aligned on the botton.

Example files:
A3.pdf
A4.pdf

Code:

from PyPDF2 import PdfReader, Transformation, PdfWriter
from PyPDF2.generic import RectangleObject

a3_file = PdfReader('A3.pdf')
a3_page = a3_file.pages[0]

a4_file = PdfReader('A4.pdf')
a4_page = a4_file.pages[0]

# Rotate 'a4_page.pdf' 90 degrees and centers it on 'a3_page.pdf':
a4_page.add_transformation(
    Transformation()
    .translate(-float(a4_page.mediaBox.getWidth()) / 2, -float(a4_page.mediaBox.getHeight()) / 2)
    .rotate(90)
    .translate(float(a4_page.mediaBox.getHeight()) / 2, float(a3_page.mediaBox.getLowerLeft_y() + a4_page.mediaBox.getWidth() / 2))
)

# Manually update page boxes:
new_width = a4_page.mediaBox.getHeight()
new_heigth = a4_page.mediaBox.getWidth()
new_y = a3_page.mediaBox.getLowerLeft_y()

a4_page.update({'/ArtBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/BleedBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/CropBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/MediaBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})
a4_page.update({'/TrimBox': RectangleObject([0, new_y, new_width, new_y + new_heigth])})

a3_page.merge_page(a4_page)

writer = PdfWriter()
writer.add_page(a3_page)
with open('merged_a3.pdf', 'wb') as fo:
    writer.write(fo)

refer to py-pdf#1280 for history

Nikita7x · 2022-09-16T08:42:51Z

@pubpub-zz thank you! Your solution with rotating the stamp file works perfectly!

pubpub-zz · 2022-09-16T11:25:01Z

@pubpub-zz thank you! Your solution with rotating the stamp file works perfectly!

@Nikita7x
Have you applied the change from the PR ?

See #1280 for the context of this change

Nikita7x · 2022-09-19T13:25:34Z

I updated PyPDF2 to a newer version which supports method **transfer_rotation_to_content** and now I'm getting Exception while trying to use this method on pages with rotation

first_page = original_pdf.getPage(0)
first_page.transfer_rotation_to_content()

transfer_rotation_to_content

    pt1 = trsf.apply_on(cast(RectangleObject, self[b]).lower_left)
AttributeError: 'ArrayObject' object has no attribute 'lower_left'

The exact file: rot_file.pdf

pubpub-zz · 2022-09-19T16:58:09Z

@Nikita7x ,
I've just checked with version 2.10.9 and I'm not getting any issue.
using pdb.pm(), can you display the content of self, b and type(b)
Can you indicate also with version of Python are you using

Nikita7x · 2022-09-20T07:34:11Z

@Nikita7x ,
I've just checked with version 2.10.9 and I'm not getting any issue.
using pdb.pm(), can you display the content of self, b and type(b)
Can you indicate also with version of Python are you using

I've checked and the version is 2.10.9, Python 3.9.
Now I see. When I cover sensitive data on this PDF, it is then processed normally. But original PDF is still raises Exception. I unfortunately can't send this PDF here, because it has a lot of sensitive data, but I can send it privately and directly to you so you can see it yourself, @pubpub-zz

Nikita7x · 2022-09-20T08:00:20Z

I've noticed I can't reproduce this problem while debugging. But using Django trace I was able to get local variables.
This Exception occurs when b is '/CropBox' while iterating through line 402 in file _page.py.

b | '/CropBox'
mb | RectangleObject([0, 0, 841.68, 595.08])
pt1 | (595.0800000000002, 0.0)
pt2 | (0.0, 841.6799999999998)
r | -270
self | {'/Annots': [],  '/Contents': {},  '/CropBox': [0, 0, 841.68, 595.08],  '/MediaBox': RectangleObject([0, 0, 595.080000000000154614099301397800445556640625, 841.67999999999983629095368087291717529296875]),  '/Parent': IndirectObject(1, 0, 4419751216),  '/Resources': {'/ProcSet': ['/PDF', '/ImageC'],                 '/XObject': {'/Im0': IndirectObject(6, 0, 4419751216)}},  '/Rotate': 0,  '/Type': '/Page'}

Nikita7x · 2022-09-20T08:09:53Z

Maybe this is some kind of a system bug, because this Exception raises sometimes while running the same code and the same file. I've tried rebooting the system, but it didn't help, it's really strange.

Django Version: 4.1.1
Python Version: 3.9.5

pubpub-zz · 2022-09-20T11:37:18Z

Understood, will come back with a fix tonight

pubpub-zz · 2022-09-20T16:50:25Z

@Nikita7x,
can you try this patch in _page.py, at about line 400, can you replace the for look with this code and confirm the error goes away:

        for b in ["/MediaBox", "/CropBox", "/BleedBox", "/TrimBox", "/ArtBox"]:
            if b in self:
                r=RectangleObject(self[b])
                pt1 = trsf.apply_on(r.lower_left)
                pt2 = trsf.apply_on(r.upper_right)
                self[NameObject(b)] = RectangleObject(
                    (
                        min(pt1[0], pt2[0]),
                        min(pt1[1], pt2[1]),
                        max(pt1[0], pt2[0]),
                        max(pt1[1], pt2[1]),
                    )
                )

py-pdf#1280 (comment)

See #1280 (comment)

Nikita7x · 2022-09-22T09:07:32Z

can you try this patch in _page.py, at about line 400, can you replace the for look with this code and confirm the error goes away:

Yeah, it seems like it resolves the issue!

pubpub-zz · 2022-09-26T17:39:51Z

PR delivered and issued

MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Aug 27, 2022

MartinThoma changed the title ~~Wrong page dimensions in MediaBox~~ Merging of rotated pages leads to unexpected results Aug 27, 2022

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 10, 2022

FIX : Fix Error in transformations

a5224c0

Error detected during analysis of py-pdf#1280

pubpub-zz mentioned this issue Sep 10, 2022

FIX : Fix Error in transformations #1341

Merged

MartinThoma pushed a commit that referenced this issue Sep 10, 2022

BUG: Fix Error in transformations (#1341)

95012cc

Error detected during analysis of #1280

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 15, 2022

ENH : add rotation and transfer_rotate_to_content

94b8c60

refer to py-pdf#1280 for history

pubpub-zz mentioned this issue Sep 15, 2022

ENH: Add rotation property and transfer_rotate_to_content #1348

Merged

MartinThoma pushed a commit that referenced this issue Sep 17, 2022

ENH: Add rotation property and transfer_rotate_to_content (#1348)

71de6c8

See #1280 for the context of this change

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 20, 2022

FIX : errors in transfer_rotation_to_content()

fbd2d05

py-pdf#1280 (comment)

pubpub-zz mentioned this issue Sep 20, 2022

BUG: Errors in transfer_rotation_to_content() #1356

Merged

MartinThoma pushed a commit that referenced this issue Sep 22, 2022

BUG: Errors in transfer_rotation_to_content() (#1356)

7803a72

See #1280 (comment)

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Sep 24, 2022

pubpub-zz closed this as completed Sep 26, 2022

Merging of rotated pages leads to unexpected results #1280

Merging of rotated pages leads to unexpected results #1280

Comments

Nikita7x commented Aug 26, 2022 • edited by MartinThoma Loading

Environment

Code + PDF

Output

pubpub-zz commented Aug 26, 2022

Nikita7x commented Aug 26, 2022

Nikita7x commented Aug 26, 2022 • edited Loading

pubpub-zz commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

MartinThoma commented Aug 27, 2022 • edited Loading

MartinThoma commented Aug 27, 2022

Nikita7x commented Aug 27, 2022

Nikita7x commented Aug 27, 2022

Nikita7x commented Aug 27, 2022

Nikita7x commented Aug 27, 2022 • edited Loading

MartinThoma commented Aug 27, 2022

Nikita7x commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

Nikita7x commented Aug 27, 2022

carecavoador commented Aug 29, 2022

MartinThoma commented Aug 29, 2022

carecavoador commented Aug 29, 2022

Nikita7x commented Sep 6, 2022

pubpub-zz commented Sep 8, 2022

MartinThoma commented Sep 8, 2022

pubpub-zz commented Sep 10, 2022 • edited by MartinThoma Loading

Nikita7x commented Sep 13, 2022

MartinThoma commented Sep 14, 2022

pubpub-zz commented Sep 14, 2022

carecavoador commented Sep 14, 2022 • edited Loading

carecavoador commented Sep 15, 2022

Nikita7x commented Sep 16, 2022

pubpub-zz commented Sep 16, 2022 • edited by MartinThoma Loading

Nikita7x commented Sep 19, 2022 • edited Loading

pubpub-zz commented Sep 19, 2022 • edited Loading

Nikita7x commented Sep 20, 2022

Nikita7x commented Sep 20, 2022

Nikita7x commented Sep 20, 2022 • edited Loading

pubpub-zz commented Sep 20, 2022

pubpub-zz commented Sep 20, 2022

Nikita7x commented Sep 22, 2022

pubpub-zz commented Sep 26, 2022

Nikita7x commented Aug 26, 2022 •

edited by MartinThoma

Loading

Nikita7x commented Aug 26, 2022 •

edited

Loading

MartinThoma commented Aug 27, 2022 •

edited

Loading

Nikita7x commented Aug 27, 2022 •

edited

Loading

pubpub-zz commented Sep 10, 2022 •

edited by MartinThoma

Loading

carecavoador commented Sep 14, 2022 •

edited

Loading

pubpub-zz commented Sep 16, 2022 •

edited by MartinThoma

Loading

Nikita7x commented Sep 19, 2022 •

edited

Loading

pubpub-zz commented Sep 19, 2022 •

edited

Loading

Nikita7x commented Sep 20, 2022 •

edited

Loading