-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault When Updating PDF Form Field Value #4004
Comments
You are incorrectly accessing widgets after end-of-life of the owning page. I understand your intention, but you must modify your approach. |
Adresses issue #4004. Calling CheckParent should raise ReferenceError, respetively ValueError if an objects parent is no longer available.
Adresses issue #4004. Calling CheckParent should raise ReferenceError, respetively ValueError if an objects parent is no longer available.
Adresses issue #4004. Calling CheckParent should raise ReferenceError, respetively ValueError if an objects parent is no longer available.
@JorjMcKie Thank you so much for responding to my issue and for sketching a proper approach. I see that you are working on a fix that would have caused my code to throw an error rather than to segfault. I'll now read through the docs to translate your hints into code. |
Using the feedback from @JorjMcKie , here's what I came up with help from some code-writing AI import pymupdf as pmp
from collections import defaultdict
from typing import Dict, List, Any, Optional
def get_widgets_info(doc: pmp.Document) -> Dict[str, List[Dict[str, Any]]]:
"""
Extracts and returns a dictionary of widget information indexed by their names.
Args:
doc: PyMuPDF document object
Returns:
Dictionary mapping field names to lists of widget information dictionaries
"""
widgets_by_name = defaultdict(list)
for page_num in range(len(doc)):
page = doc.load_page(page_num)
for widget in page.widgets():
widgets_by_name[widget.field_name].append({
"page_num": page_num,
"xref": widget.xref,
"field_type": widget.field_type,
"field_value": widget.field_value,
"rect": widget.rect
})
return widgets_by_name
def update_widget_value(doc: pmp.Document, page_num: int, xref: int, new_value: str) -> bool:
"""
Safely updates a widget's value by reloading the page and widget.
Args:
doc: PyMuPDF document object
page_num: Page number containing the widget
xref: Cross-reference number of the widget
new_value: New value to set for the widget
Returns:
True if widget was successfully updated, False otherwise
"""
try:
page = doc.load_page(page_num)
for widget in page.widgets():
if widget.xref == xref:
widget.field_value = new_value
widget.update()
return True
return False
except Exception as e:
print(f"Error updating widget: {e}")
return False
def main():
"""Main function to process the PDF form"""
try:
# Open document and get widgets info
doc = pmp.open("simple_form.pdf")
widgets_info = get_widgets_info(doc)
# Print widget information
for name, widgets in widgets_info.items():
print(f"Widget Name: {name}")
for widget_info in widgets:
print(f" Page: {widget_info['page_num'] + 1}, "
f"Type: {widget_info['field_type']}, "
f"Value: {widget_info['field_value']}, "
f"Rect: {widget_info['rect']}")
# Update field value safely
if "Text1" in widgets_info and widgets_info["Text1"]:
widget_info = widgets_info["Text1"][0]
success = update_widget_value(
doc,
widget_info["page_num"],
widget_info["xref"],
"1234567890"
)
if success:
print("Widget updated successfully")
doc.save("simple_form_filled.pdf", garbage=4, deflate=True)
else:
print("Failed to update widget")
except Exception as e:
print(f"Error processing PDF: {e}")
finally:
if 'doc' in locals():
doc.close()
if __name__ == "__main__":
main() |
Fast reaction! |
@JorjMcKie Fast reaction because I'm so excited that you responded to my cry for help so quickly -- and you got me unstuck! I ran into the segfault almost two weeks ago and only just got around to posting the issue yesterday. I'm so happy to be able to use PyMuPDF (along with PyPDF). Thanks also for the telling me that I can load the widget directly. |
My only question is: what for do you still need pypdf (🤷♂️😉)? |
BTW thanks for the report: it pointed us to an open problem! |
@JorjMcKie I started with PyMuPDF because I had read that it was the modern, fast library. After I ran into the segfault, I turned to PyPDF with the hope of eventually returning to PyMuPDF. So here I am. One issue I couldn't get working with PyPDF is renaming widgets tied to the same field name into different names. I still haven't been able to successfully delete widgets using PyPDF. I'm hoping that I'll be able to use PyMuPDF to solve this problem. |
…ng to get page from annot.
…et page from annot. The fix requires MuPDF >= 1.25, specifically this MuPDF commit: When annotation is deleted from page, remove link from annotation to page.
You can delete widgets with PyMuPDF. |
@JorjMcKie I'm a newbie when it comes to programmatically manipulating PDF files. One unpleasant surprise for me has been how fragile Adobe Acrobat Pro has been for editing form elements. I've been changing names and adding widgets and suddenly, the resulting file is corrupted and I loose all my edits. How can Adobe Acrobat, software that should be the closest to the canonical software for working with PDFs be so junky? I started out trying to use the JS programmatic interface in Acrobat to manipulate the PDF but have abandoned that approach. Happy to be digging into PyMuPDF now. |
… page from annot. The fix requires MuPDF >= 1.25, specifically this MuPDF commit: When annotation is deleted from page, remove link from annotation to page.
… page from annot. The fix requires MuPDF >= 1.25, specifically this MuPDF commit: When annotation is deleted from page, remove link from annotation to page.
In PyMuPDF git, we now have a fix for the underlying SEGV. If an annotation is unbound from its parent page (for example if the Unfortunately the fix requires a new release of MuPDF. So depending on MuPDF release timescales, it might not be in the next release pf PyMuPDF. |
Description of the bug
When attempting to update a PDF form field value using
widget.update()
, the application crashes with a segmentation fault. The crash occurs specifically in the PDF annotation rectangle handling code.How to reproduce the bug
simple_form.pdf
)simple_form.pdf
Run the following code:
output of program:
Current Behavior
The program crashes with a segmentation fault when calling
field.update()
. The crash occurs in the PDF annotation rectangle handling code.Crash Details
Stack trace from
fault.log
:The crash trace indicates the following call chain:
widget.update()
_save_widget()
JM_set_widget_properties()
pdf_set_annot_rect()
Additional Context
Rect(172.80099487304688, 117.16400146484375, 322.8009948730469, 139.16400146484375)
See detailed crash from
Console.app
:PyMuPDF version
1.24.13
Operating system
MacOS
Python version
3.12
The text was updated successfully, but these errors were encountered: