Skip to content

Commit bc48fa9

Browse files
Arya-A-NairOpenNingia
authored andcommitted
ENH: Enhance XMP metadata handling with creation and setter methods (py-pdf#3410)
Closes py-pdf#3394. Closes py-pdf#3395.
1 parent 661baac commit bc48fa9

File tree

4 files changed

+1143
-203
lines changed

4 files changed

+1143
-203
lines changed

docs/user/metadata.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,141 @@ if meta:
121121
print(meta.xmp_create_date)
122122
```
123123

124+
## Creating XMP metadata
125+
126+
You can create XMP metadata easily using the `XmpInformation.create()` method:
127+
128+
```python
129+
from pypdf import PdfWriter
130+
from pypdf.xmp import XmpInformation
131+
132+
# Create a new XMP metadata object
133+
xmp = XmpInformation.create()
134+
135+
# Set metadata fields
136+
xmp.dc_title = {"x-default": "My Document Title"}
137+
xmp.dc_creator = ["Author One", "Author Two"]
138+
xmp.dc_description = {"x-default": "Document description"}
139+
xmp.dc_subject = ["keyword1", "keyword2", "keyword3"]
140+
xmp.pdf_producer = "pypdf"
141+
142+
# Create a writer and add the metadata
143+
writer = PdfWriter()
144+
writer.add_blank_page(612, 792) # Add a page
145+
writer.xmp_metadata = xmp
146+
writer.write("output.pdf")
147+
```
148+
149+
## Setting XMP metadata fields
150+
151+
The `XmpInformation` class provides property-based access for all supported metadata fields:
152+
153+
### Dublin Core fields
154+
155+
```python
156+
from datetime import datetime
157+
from pypdf.xmp import XmpInformation
158+
159+
xmp = XmpInformation.create()
160+
161+
# Single value fields
162+
xmp.dc_coverage = "Global coverage"
163+
xmp.dc_format = "application/pdf"
164+
xmp.dc_identifier = "unique-id-123"
165+
xmp.dc_source = "Original Source"
166+
167+
# Array fields (bags - unordered)
168+
xmp.dc_contributor = ["Contributor One", "Contributor Two"]
169+
xmp.dc_language = ["en", "fr", "de"]
170+
xmp.dc_publisher = ["Publisher One"]
171+
xmp.dc_relation = ["Related Doc 1", "Related Doc 2"]
172+
xmp.dc_subject = ["keyword1", "keyword2"]
173+
xmp.dc_type = ["Document", "Text"]
174+
175+
# Sequence fields (ordered arrays)
176+
xmp.dc_creator = ["Primary Author", "Secondary Author"]
177+
xmp.dc_date = [datetime.now()]
178+
179+
# Language alternative fields
180+
xmp.dc_title = {"x-default": "Title", "en": "English Title", "fr": "Titre français"}
181+
xmp.dc_description = {"x-default": "Description", "en": "English Description"}
182+
xmp.dc_rights = {"x-default": "All rights reserved"}
183+
```
184+
185+
### XMP fields
186+
187+
```python
188+
from datetime import datetime
189+
190+
# Date fields accept both datetime objects and strings
191+
xmp.xmp_create_date = datetime.now()
192+
xmp.xmp_modify_date = "2023-12-25T10:30:45Z"
193+
xmp.xmp_metadata_date = datetime.now()
194+
195+
# Text field
196+
xmp.xmp_creator_tool = "pypdf"
197+
```
198+
199+
### PDF fields
200+
201+
```python
202+
xmp.pdf_keywords = "keyword1, keyword2, keyword3"
203+
xmp.pdf_pdfversion = "1.4"
204+
xmp.pdf_producer = "pypdf"
205+
```
206+
207+
### XMP Media Management fields
208+
209+
```python
210+
xmp.xmpmm_document_id = "uuid:12345678-1234-1234-1234-123456789abc"
211+
xmp.xmpmm_instance_id = "uuid:87654321-4321-4321-4321-cba987654321"
212+
```
213+
214+
### PDF/A fields
215+
216+
```python
217+
xmp.pdfaid_part = "1"
218+
xmp.pdfaid_conformance = "B"
219+
```
220+
221+
### Clearing metadata fields
222+
223+
You can clear any field by assigning `None`:
224+
225+
```python
226+
xmp.dc_title = None
227+
xmp.dc_creator = None
228+
xmp.pdf_producer = None
229+
```
230+
231+
### Incrementally updating XMP metadata fields
232+
233+
When modifying existing XMP metadata, it is often necessary to add or update individual entries while preserving existing values. The XMP properties return standard Python data structures that can be manipulated directly:
234+
235+
```python
236+
from pypdf.xmp import XmpInformation
237+
238+
xmp = XmpInformation.create()
239+
240+
# Language alternative fields return dictionaries
241+
title = xmp.dc_title or {}
242+
title["en"] = "English Title"
243+
title["fr"] = "Titre français"
244+
xmp.dc_title = title
245+
246+
# Bag fields (unordered collections) return lists
247+
subjects = xmp.dc_subject or []
248+
subjects.append("new_keyword")
249+
xmp.dc_subject = subjects
250+
251+
# Sequence fields (ordered collections) return lists
252+
creators = xmp.dc_creator or []
253+
creators.append("New Author")
254+
xmp.dc_creator = creators
255+
```
256+
257+
This approach provides direct control over the data structures while maintaining the property-based interface.
258+
124259
## Modifying XMP metadata
125260

126261
Modifying XMP metadata is a bit more complicated.

pypdf/errors.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,7 @@ class EmptyImageDataError(PyPdfError):
6868

6969
class LimitReachedError(PyPdfError):
7070
"""Raised when a limit is reached."""
71+
72+
73+
class XmpDocumentError(PyPdfError, RuntimeError):
74+
"""Raised when the XMP XML document context is invalid or missing."""

0 commit comments

Comments
 (0)