-
Notifications
You must be signed in to change notification settings - Fork 561
How to Maintain PDF Links
This shows new Page
methods at work that maintain a PDF page's set of links.
Links are pointers to some other places including things like
- other locations in the same document
- locations in another PDF file
- other general files (executed on hot area clicks)
- internet addresses (accessed on hot area clicks)
Links of a page can usually be recognized by changes of the cursor appearance - from an arrow to a pointing hand. The spot where this happens is often called "hot area". It is a rectangular area surrounding e.g. a text or an image.
As always when dealing with document changes, these methods are restricted to PDF files.
MuPDF handles link annotations differently from other annotation types. Because of its technical implications we have decided to also let PyMuPDF handle links in a different way. The implementation works as follows:
- there are three methods dealing with links:
insertLink()
,deleteLink()
andupdateLink()
. - each method accepts one parameter, which is a dictionary of values describing the link destination.
- The dictionary has the same format as the entries of
getLinks()
and the fourth component of entries ingetToC(simple = False)
. It contains almost the same information as thelinkDest
object - only in a more readable and desambiguised format.
We read a PDF document page and all of its links. We then delete the last link, change the first link and finally insert a new link. To see a fully functional GUI supporting this, have a look at this example, it presents an interface like this.
>>> import fitz
>>> doc = fitz.open("pymupdf.pdf")
>>> page = doc[6] # this page contains 4 links
>>> lnks = page.getLinks()
>>> for l in lnks:
print(l)
{'kind': 2, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'uri', 'uri': 'https://github.com/rk700/PyMuPDF'}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
{'kind': 2, 'xref': 1059, 'from': fitz.Rect(383.579, 526.211,
430.582, 537.961), 'type': 'uri', 'uri': 'https://en.wikipedia.org/wiki/MuPDF'}
>>>
>>> #-------------------------------------------------------------------------------------------------
>>> # delete last link on page
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[-1]
>>> page.deleteLink(l)
>>> # retrieve all links again to show it is gone
>>> for l in page.getLinks():
print(l)
{'kind': 2, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'uri', 'uri': 'https://github.com/rk700/PyMuPDF'}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
>>>
>>> #-------------------------------------------------------------------------------------------------
>>> # now change first link to point to somewhere on page 1 of same file
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[0]
>>> l["kind"] = fitz.LINK_GOTO
>>> l["page"] = 1
>>> l["to"] = fitz.Point(100, 200)
>>> page.updateLink(l)
>>> # again demonstrate what happened
>>> for l in page.getLinks():
print(l)
{'kind': 1, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'goto', 'page': 1, 'to': fitz.Point(100.0, 200.0), 'zoom': 0.0}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
>>>
>>> #-------------------------------------------------------------------------------------------------
>>> # now recreate the deleted link to open another file
>>> #-------------------------------------------------------------------------------------------------
>>> l = lnks[3]
>>> l["kind"] = fitz.LINK_LAUNCH
>>> l["file"] = "some.file"
>>> page.insertLink(l)
>>> for l in page.getLinks():
print(l)
{'kind': 1, 'xref': 864, 'from': fitz.Rect(249.714, 142.312,
295.942, 154.063), 'type': 'goto', 'page': 1, 'to': fitz.Point(100.0, 200.0), 'zoom': 0.0}
{'kind': 2, 'xref': 1090, 'from': fitz.Rect(255.626, 257.481,
301.854, 269.231), 'type': 'uri',
'uri': 'https://github.com/JorjMcKie/PyMuPDF-optional-material'}
{'kind': 2, 'xref': 978, 'from': fitz.Rect(183.562, 325.227,
206.773, 336.977), 'type': 'uri',
'uri': 'https://pypi.python.org/pypi?:action=display&name=PyMuPDF&version=1.10.0'}
{'kind': 3, 'xref': 1251, 'from': fitz.Rect(383.579, 526.211,
430.582, 537.961), 'type': 'launch', 'file': 'some.file'}
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance