py-pdf · MartinThoma · Dec 23, 2023 · Jul 5, 2023 · Jul 5, 2023 · Jul 6, 2023
diff --git a/docs/user/add-watermark.md b/docs/user/add-watermark.md
@@ -81,3 +81,44 @@ def watermark(
 ```
 
 ![watermark.png](watermark.png)
+
+## Stamping images not in PDF format
+
+The above code only works for images that are already in PDF format. However, you can easilly convert an image to PDF image using [Pillow](https://pypi.org/project/Pillow/).
+
+```python
+from PIL import Image
+from io import BytesIO
+from pypdf import PdfWriter, PdfReader, Transformation
+
+def stamp_img(
+ content_pdf: Path,
+ stamp_img: Path,
+ pdf_result: Path,
+ page_indices: Union[Literal["ALL"], List[int]] = "ALL",
+):
+ # Convert the image to a PDF
+ img = Image.open(stamp_img)
+ img_as_pdf = BytesIO()
+ img.save(img_as_pdf, 'pdf')
+ stamp_pdf = PdfReader(img_as_pdf)
+
+ # Then use the same stamp code from above
+ stamp_page = stamp_pdf.pages[0]
+
+ writer = PdfWriter()
+
+ reader = PdfReader(content_pdf)
+ if page_indices == "ALL":
+ page_indices = list(range(0, len(reader.pages)))
+ for index in page_indices:
+ content_page = reader.pages[index]
+ content_page.merge_transformed_page(
+ stamp_page,
+ Transformation(),
+ )
+ writer.add_page(content_page)
+
+ with open(pdf_result, "wb") as fp:
+ writer.write(fp)
+```
diff --git a/docs/user/forms.md b/docs/user/forms.md
@@ -8,6 +8,9 @@ from pypdf import PdfReader
 reader = PdfReader("form.pdf")
 fields = reader.get_form_text_fields()
 fields == {"key": "value", "key2": "value2"}
+
+# Or get Field objects instead of just text values:
+fields = reader.get_fields()
 ```
 
 ## Filling out forms
@@ -27,7 +30,38 @@ writer.update_page_form_field_values(
  writer.pages[0], {"fieldname": "some filled in text"}
 )
 
+# If you want to fill out *all* pages, it is also safe to do this:
+data = {"fieldname": "some filled in text", "othername": "more text for an input on a different page"}
+for page in writer.pages:
+ writer.update_page_form_field_values(page, data)
+
 # write "output" to pypdf-output.pdf
 with open("filled-out.pdf", "wb") as output_stream:
  writer.write(output_stream)
 ```
+
+## A note about form fields and annotations
+
+The PDF form stores form fields as annotations with the subtype "\Widget". This means that the following two blocks of code will give fairly similar results:
+
+```python
+from pypdf import PdfReader
+reader = PdfReader("form.pdf")
+fields = reader.get_fields()
+```
+
+```python
+from pypdf import PdfReader
+from pypdf.constants import AnnotationDictionaryAttributes
+reader = PdfReader("form.pdf")
+fields = []
+for page in reader.pages:
+ for annot in page.annotations:
+ annot = annot.get_object()
+ if annot[AnnotationDictionaryAttributes.Subtype] == "/Widget":
+ fields.append(annot)
+```
+
+However, while similar, there are some very important differences between the two above blocks of code. Most importantly, the first block will return a list of Field objects, where as the second will return more generic dictionary-like objects. The objects lists will *mostly* reference the same object in the underlying PDF, meaning you'll find that `obj_taken_fom_first_list.indirect_reference == obj_taken_from _second_list.indirect_reference`. Field objects are generally more ergonomic, as the exposed data can be access via clearly named properties. However, the more generic dictionary-like objects will contain data that the Field object does not expose, such as the Rect (the widget's position on the page). So, which to use will depend on your use case.
+
+However, it's also important to note that the two lists do not *always* refer to the same underlying PDF objects. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons).