Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_redactions causes part of the page content to be hidden / transparent #3751

Open
beeing opened this issue Aug 5, 2024 · 27 comments
Open
Labels
bug fix developed release schedule to be determined upstream bug bug outside this package

Comments

@beeing
Copy link

beeing commented Aug 5, 2024

Description of the bug

I'm adding a redaction region to a part of the PDF, but after calling apply_redactions(), one side of the entire page goes missing (opened in macOS Preview app or Safari).

Further inspection reveals that the text is not missing, as it is selectable and can be copied out properly. It is either the text has been masked / hidden, but I could not find out how to check further (sorry, my limited knowledge on PDF structure).

Untitled

The media, crop, art, bleed, trim boxes all looks fine before and after the redactions. In fact, I'm trying to check if there's other paths, objects that may be causing it but there's nothing.

Note that I'm not able to share the actual PDF but it was generated from Puppeteer / Chromium (PDF ver 1.7).

Thanks in advance for looking into this.

How to reproduce the bug

  1. Generate PDF from Chromium / Puppeteer
  2. Add redaction of any size eg. (0,0,1,1) and call apply_redactions()
  3. Open the PDF in Preview App.

PyMuPDF version

1.24.9

Operating system

MacOS

Python version

3.9

@Kyrylo-Hrytsenko
Copy link

Kyrylo-Hrytsenko commented Aug 5, 2024

I faced the same problem.
Note: this problem does not exist on version 1.23.9, all higher versions have it

There is also an interesting thing. If you open redacted PDF via Chrome or LibreOffice it will look as expected. But the issue is reproducible at least with Mac Preview and react-pdf-viewer lib.

@JorjMcKie
Copy link
Collaborator

As always: please provide an example PDF! There is no way to otherwise deal with this post.

@beeing
Copy link
Author

beeing commented Aug 6, 2024

I've just tested and it works on older version (up to pymupdf-1.23.26).

Perhaps easier to compare the commits since a868c0a until HEAD.

@Kyrylo-Hrytsenko
Copy link

Kyrylo-Hrytsenko commented Aug 6, 2024

@JorjMcKie
Here is an example
original.pdf
redacted.pdf

Please, open the redacted file with the Preview app, it will look like this
Screenshot 2024-08-06 at 3 38 40 PM

The code for redaction looks like this:

pdfIn = fitz.open(input_file)

out_buffer = BytesIO()

page = pdfIn[0]

page.add_redact_annot([0,0,100,100], text=None, fill=(0, 0, 0))
page.add_redact_annot([100,100,200,200], text=None, fill=(0, 0, 0))
page.add_redact_annot([200,200,300,300], text=None, fill=(0, 0, 0))
page.add_redact_annot([300,300,400,400], text=None, fill=(0, 0, 0))

page.apply_redactions()

pdfIn.save(out_buffer, garbage=3, deflate=True)
pdfIn.close()

with open(output_file, mode='wb') as f:
    f.write(out_buffer.getbuffer())
f.close()

@Kyrylo-Hrytsenko
Copy link

@JorjMcKie
May I ask if you have received the PDF for reproducing the issue?

@JorjMcKie
Copy link
Collaborator

@Kyrylo-Hrytsenko Thanks, I did.

I executed the script and found no problem at all using v1.24.9. I modified the script somewhat so redaction rectangles are visible and erased areas are not filled:

import pymupdf

print(pymupdf.version)
pdfIn = pymupdf.open("original.pdf")

page = pdfIn[0]
rects = (
    [0, 0, 100, 100],
    [100, 100, 200, 200],
    [200, 200, 300, 300],
    [300, 300, 400, 400],
)
for r in rects:
    page.draw_rect(r, color=(1, 0, 0))
    page.add_redact_annot(r)

page.apply_redactions()

pdfIn.ez_save("output.pdf")

Gives this correct result:
output.pdf

@Kyrylo-Hrytsenko
Copy link

@JorjMcKie
Your result file looks like this for me in the Preview app:
Screenshot 2024-08-16 at 1 59 20 PM

Notes:

  • I highlighted some text with the mouse to show that it is still present in the PDF, but for some reason, it's hidden.
  • There is no issue when I open this file in Chrome or any other app.
  • I checked version 1.23.9, and there was no such issue. It seems to have started happening in version 1.24.0.

Does the output.pdf look normal when you open it in the 'Preview' application?

@JorjMcKie
Copy link
Collaborator

I do not use or have Preview.
My file is displayed in all PDF viewers like Adobe Acrobat, Foxit, Nitro, PDF XChange, evince (Linux). So all authoritative applications behave correctly.
No idea what is wrong with Preview.

@JorjMcKie JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Aug 16, 2024
@jamie-lemon
Copy link
Collaborator

Can confirm also see this problem in Preview. However it is fine when I open in Adobe Acrobat. To me this feels like a Preview rendering bug. I would submit a bug to Apple if that is possible!

@JorjMcKie
Copy link
Collaborator

@jamie-lemon absolutely correct! I was about to write a similar comment.
We will now close this issue.

@Kyrylo-Hrytsenko
Copy link

Kyrylo-Hrytsenko commented Aug 16, 2024

@jamie-lemon @JorjMcKie
I don't think it's Preview bug only, here is why:

  • For me, it happens not only with Preview but also with the react-pdf-viewer library at least
  • With an older version of your library everything works fine, which means something was changed and caused this issue
  • Original files (before redaction) render correctly with Preview and with react-pdf-viewer, which means something in the redaction process causes this issue.

@yuhuang-cst
Copy link

@jamie-lemon @JorjMcKie I don't think it's Preview bug only, here is why:

  • For me, it happens not only with Preview but also with the react-pdf-viewer library at least
  • With an older version of your library everything works fine, which means something was changed and caused this issue
  • Original files (before redaction) render correctly with Preview and with react-pdf-viewer, which means something in the redaction process causes this issue.

@JorjMcKie @jamie-lemon
In addition to Mac Preview, Safari, UPDF, and PDF Expert also fail to display output.pdf correctly.

@jamie-lemon jamie-lemon reopened this Aug 16, 2024
@jamie-lemon jamie-lemon added bug Waiting for information and removed not a bug not a bug / user error / unable to reproduce labels Aug 16, 2024
@jamie-lemon
Copy link
Collaborator

jamie-lemon commented Aug 16, 2024

This is a strange bug - I thought it might be related to the content on page 1, but if I simplify things, target the 2nd page with an area redaction with:

import pymupdf

print(pymupdf.version)
pdfIn = pymupdf.open("orginal.pdf")

page = pdfIn[1] #2nd page
rects = (
    [0, 0, 100, 100],

)
for r in rects:
    page.draw_rect(r, color=(1, 0, 0))
    page.add_redact_annot(r)

page.apply_redactions()

pdfIn.ez_save("redacted.pdf")

Then I get:
Screenshot 2024-08-16 at 18 05 59

I also noticed that it doesn't matter how big the area redaction, I could do this:

rects = (
    [0, 0, 0, 0],
)

And achieve the same resulting problem with the left hand side of the page. I could also put that rect anywhere on the page - it didn't have to be in the top left.

Testing with other documents, redacting and viewing in Preview I don't find this issue at all, so I think there must be something very specific to this document which will need further research.

@Kyrylo-Hrytsenko
Copy link

@jamie-lemon

so I think there must be something very specific to this document which will need further research.

Totally agree. Only a small number of my documents have this bug. I didn't even plan to write to you but then noticed that this is happening not only to me and the bug was already created, so I added my example as well.

@jamie-lemon
Copy link
Collaborator

@Kyrylo-Hrytsenko Much appreciated!

@jamie-lemon
Copy link
Collaborator

This is the simplest case I could find - I made this PDF in Adobe Acrobat, then took it into Preview and then did "Export" as a new PDF.

preview-made.pdf

When you redact with PyMuPDF the logo disappears when you view it in Preview, e.g.

Screenshot 2024-08-16 at 21 24 35

@jamie-lemon
Copy link
Collaborator

So it seems if the PDF is made in Preview then this might have something to do with the problem.

@yuhuang-cst
Copy link

The issue I am encountering is that if apply_redactions is used, the vector graphics on the page all move to the bottom left corner in Preview, Safari, UPDF, and PDF Expert, whereas they display correctly in Chrome, Adobe Acrobat Reader, and WPS. Here is the code:

import fitz
doc = fitz.open('origin.pdf')
page = doc.load_page(0)
page.add_redact_annot((0, 0, 0 ,0), fill=False)
page.apply_redactions()
doc.ez_save('apply_redaction.pdf')
doc.close()

origin.pdf
apply_redaction.pdf
image

The origin.pdf is from the second page of the AlphaGo paper: https://www.nature.com/articles/nature16961

@yuhuang-cst
Copy link

yuhuang-cst commented Aug 17, 2024

The PDF generated with PyMuPDF version 1.23.26 displays the vector graphics correctly in Preview (although the image in the top right corner is partially missing). However, starting from version 1.24.0, there is a bug where the vector graphics are moved to the bottom left corner.
apply_redaction_1.23.26.pdf
apply_redaction_1.24.0.pdf

@JorjMcKie
Copy link
Collaborator

It seems that primarily Mac-based tools have problems with redacted PDFs that have been created with Preview.
I am experimenting with the MuPDF development version 1.25.0.
The current PyMuPDF v1.24.9 uses MuPDF v1.24.8.

When creating and applying annotations using PyMuPDF 1.24.9 with MuPDF 1.25.0 I do no longer see the error using the Firefox browser - which does behave awkwardly as all those Mac apps.

I am attaching the produced output.pdf inviting Mac users to access it with their Preview on Mac:
output.pdf

@yuhuang-cst
Copy link

It seems that primarily Mac-based tools have problems with redacted PDFs that have been created with Preview. I am experimenting with the MuPDF development version 1.25.0. The current PyMuPDF v1.24.9 uses MuPDF v1.24.8.

When creating and applying annotations using PyMuPDF 1.24.9 with MuPDF 1.25.0 I do no longer see the error using the Firefox browser - which does behave awkwardly as all those Mac apps.

I am attaching the produced output.pdf inviting Mac users to access it with their Preview on Mac: output.pdf

image It seems that this bug still exists in Mac Preview.

@JorjMcKie
Copy link
Collaborator

@yuhuang-cst thanks for the feedback anyway

@jamie-lemon
Copy link
Collaborator

Can also confirm that the bug doesn't exist in PyMuPDF version 1.23.9

@JorjMcKie
Copy link
Collaborator

I have submitted a problem report in MuPDF's system here:https://bugs.ghostscript.com/show_bug.cgi?id=707966

@K8S666
Copy link

K8S666 commented Sep 27, 2024

Report the same issue。

@julian-smith-artifex-com julian-smith-artifex-com added upstream bug bug outside this package and removed Waiting for information labels Oct 28, 2024
@julian-smith-artifex-com
Copy link
Collaborator

Apparent MuPDF master branch has a fix, so PyMuPDF itself will be fixed when we make a release with MuPDF-1.25.0.

However i don't currently know when MuPDF_1.25.0 will be released.

@julian-smith-artifex-com julian-smith-artifex-com added the fix developed release schedule to be determined label Oct 28, 2024
@jamie-lemon
Copy link
Collaborator

jamie-lemon commented Nov 6, 2024

@julian-smith-artifex-com @JorjMcKie Let's release soon to hopefully fix this one! Related I think: #4029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix developed release schedule to be determined upstream bug bug outside this package
Projects
None yet
Development

No branches or pull requests

7 participants