Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piximap program crash #3848

Open
JordanGarske opened this issue Sep 5, 2024 · 3 comments
Open

Piximap program crash #3848

JordanGarske opened this issue Sep 5, 2024 · 3 comments
Labels

Comments

@JordanGarske
Copy link

Description of the bug

What is happening is that when I read from the PDF, I use the rectangle information to collect color data. Recently, however, I encountered an issue with one of the PDF collections: while collecting the color information, the program crashes unexpectedly without any error messages. I traced the problem to the color_topusage function. Specifically, the error appears to originate from the call to this function, where the issue seems to be related to color_count -> JM_color_count -> oldpix = read_sample(pm, s, n).

How to reproduce the bug

I traced the problem to the color_topusage function. Specifically, the error appears to originate from the call to this function, where the issue seems to be related to color_count -> JM_color_count -> oldpix = read_sample(pm, s, n).

PyMuPDF version

1.24.10

Operating system

Windows

Python version

3.12

@JorjMcKie
Copy link
Collaborator

Please do provide reproducing file and code.

@JordanGarske
Copy link
Author

case.pdf
import pymupdf
from pymupdf import Page, table, Rect , Pixmap, sRGB_to_rgb

from pymupdf.utils import get_pixmap, get_page_pixmap
#Similar file with error; it loops on a 4-page document and crashes

on page 2 of 4.
pdf = pymupdf.open(f'./case.pdf')
for i in range(0, len(pdf) ):
page = pdf.load_page(i)
print(page)
for annot in page.get_drawings():
if page.get_textbox(annot['rect'] ) :
rect = annot['rect']
pix: Pixmap = get_pixmap(page, clip= rect)
colorBytes = pix.color_topusage()
print('done')

@JorjMcKie
Copy link
Collaborator

This script works:

import pymupdf

doc = pymupdf.open("case.pdf")
page = doc[1]
paths = page.get_drawings()
for i, p in enumerate(paths):
    rect = p["rect"]  # best check for emptiness already here!
    text = page.get_textbox(rect)
    if text:
        pix = page.get_pixmap(clip=rect)
        if pymupdf.IRect(pix.irect).is_empty:
            print(i, "pixmap has empty area - skipping")
            continue
        print("path", i, pix.color_topusage())

So the reason for the crash is a pixmap that covers no area, and the color_topusage() method does not protext itself against nonsense input.
Which must be fixed of course: we will raise an exception in such cases.

Otherwise please take note of the following comments:

  • You are iterating over vector graphics which can consist of just a line - which obviously cannot contain anything, including text. You could / should ignore those right away. Lines may be not parallel to either the x- or the y-axis. In that case, the p["rect"]is not empty, so everything appears to work ... but still the underlying line contains no text, while p["rect"] may have text inside. Can't imagine you want those false alarms.
  • Depending on the circumstances (please read the relevant documentation about Rect / IRect objects and their relationship) a non-empty rectangle may be wrapped by an empty IRect (!).
  • Your use of the term "annotation" may lead to misconceptions: this is a PDF term reserved for something entirely different. Best stay with line-art, vector graphic or drawing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants