Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for /Kids and /Limits in page labels #2560

Closed
stefan6419846 opened this issue Mar 30, 2024 · 2 comments · Fixed by #2562
Closed

Add support for /Kids and /Limits in page labels #2560

stefan6419846 opened this issue Mar 30, 2024 · 2 comments · Fixed by #2562
Labels
is-feature A feature request

Comments

@stefan6419846
Copy link
Collaborator

Currently, /Kids and /Limits are not supported for page labels.

Some examples for the actual implementation might be found at #1519.

stefan6419846 added a commit that referenced this issue Mar 30, 2024
Maintaining/validating example images inside a PR is complicated. Rather use the existing issue #2560 if there are new findings.
@stefan6419846
Copy link
Collaborator Author

For an example file and the corresponding data see #2561 (comment)

Corresponding docs are "Table 37 – Entries in a number tree node dictionary" and "Table 159 – Entries in a page label dictionary".

@stefan6419846
Copy link
Collaborator Author

I just gave it a try and it seems like the following patch is sufficient to generate the correct page numbers for the aforementioned document:

diff --git a/pypdf/_page_labels.py b/pypdf/_page_labels.py
index 6f41067..3a43f2a 100644
--- a/pypdf/_page_labels.py
+++ b/pypdf/_page_labels.py
@@ -57,7 +57,7 @@ a       Lowercase letters (a to z for the first 26 pages,
                            aa to zz for the next 26, and so on)
 """
 
-from typing import Iterator, Optional, Tuple, cast
+from typing import Iterator, List, Optional, Tuple, cast
 
 from ._protocols import PdfCommonDocProtocol
 from ._utils import logger_warning
@@ -131,7 +131,8 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
     if "/PageLabels" not in root:
         return str(index + 1)  # Fallback
     number_tree = cast(DictionaryObject, root["/PageLabels"].get_object())
-    if "/Nums" in number_tree:
+
+    def handle_nums(dictionary_object: DictionaryObject) -> str:
         # [Nums] shall be an array of the form
         #   [ key 1 value 1 key 2 value 2 ... key n value n ]
         # where each key_i is an integer and the corresponding
@@ -139,7 +140,7 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
         # The keys shall be sorted in numerical order,
         # analogously to the arrangement of keys in a name tree
         # as described in 7.9.6, "Name Trees."
-        nums = cast(ArrayObject, number_tree["/Nums"])
+        nums = cast(ArrayObject, dictionary_object["/Nums"])
         i = 0
         value = None
         start_index = 0
@@ -165,16 +166,18 @@ def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
         start = value.get("/St", 1)
         prefix = value.get("/P", "")
         return prefix + m[value.get("/S")](index - start_index + start)
-    if "/Kids" in number_tree or "/Limits" in number_tree:
-        logger_warning(
-            (
-                "/Kids or /Limits found in PageLabels. "
-                "Please share this PDF with pypdf: "
-                "https://github.com/py-pdf/pypdf/pull/1519"
-            ),
-            __name__,
-        )
-    # TODO: Implement /Kids and /Limits for number tree
+
+    if "/Nums" in number_tree:
+        return handle_nums(number_tree)
+
+    if "/Kids" in number_tree:
+        kids: List[DictionaryObject] = number_tree["/Kids"]
+        for kid in kids:
+            limits: List[int] = kid["/Limits"]
+            if limits[0] <= index <= limits[1]:
+                return handle_nums(kid)
+
+    logger_warning(f"Could not reliably determine page label for {index}.")
     return str(index + 1)  # Fallback if /Nums is not in the number_tree
 
 

This more or less is the same as before, only looking into the /Limits of the /Kids to see which IndirectObject belongs to the current index.

pubpub-zz pushed a commit that referenced this issue Mar 30, 2024
Maintaining/validating example images inside a PR is complicated. Rather use the existing issue #2560 if there are new findings.
@MartinThoma MartinThoma added the is-feature A feature request label Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-feature A feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants