Remove reference/access to .figures

Per diagram and explanation in [1], LTFigure is just a layout-estimated collation of LTCurve objects. For consistency with the rest of pdfplumber, removing it from the list of parsed/accessible objects, although we still do process the curves within each figure. [1] https://github.com/pdfminer/pdfminer.six/blob/develop/docs/source/topic/converting_pdf_to_text.rst
jsvine · Aug 26, 2020 · 8e74cb9 · 8e74cb9
1 parent a74d3bc
commit 8e74cb9
Show file tree

Hide file tree

Showing 4 changed files with 6 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -90,7 +90,7 @@ The `pdfplumber.Page` class is at the core of `pdfplumber`. Most things you'll d
 |`.page_number`| The sequential page number, starting with `1` for the first page, `2` for the second, and so on.|
 |`.width`| The page's width.|
 |`.height`| The page's height.|
-|`.objects` / `.chars` / `.lines` / `.rects` / `.curves` / `.figures` / `.images`| Each of these properties is a list, and each list contains one dictionary for each such object embedded on the page. For more detail, see "[Objects](#objects)" below.|
+|`.objects` / `.chars` / `.lines` / `.rects` / `.curves` / `.images`| Each of these properties is a list, and each list contains one dictionary for each such object embedded on the page. For more detail, see "[Objects](#objects)" below.|
 
 ... and these main methods:
 
@@ -113,7 +113,6 @@ Each instance of `pdfplumber.PDF` and `pdfplumber.Page` provides access to four
 - `.rects`, each representing a single 2-dimensional rectangle.
 - `.curves`, each representing a series of connected points.
 - `.images`, each representing an image.
-- `.figures`, each representing a figure.
 - `.annots`, each representing a single PDF annotation (cf. Section 8.4 of the [official PDF specification](https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf) for details)
 - `.hyperlinks`, each representing a single PDF annotation of the subtype `Link` and having an `URI` action attribute
 
@@ -198,10 +197,6 @@ Additionally, both `pdfplumber.PDF` and `pdfplumber.Page` provide access to two
 
 [To be completed.]
 
-#### `figure` properties
-
-[To be completed.]
-
 ## Visual debugging
 
 __Note:__ To use `pdfplumber`'s visual-debugging tools, you'll also need to have two additional pieces of software installed on your computer:

diff --git a/pdfplumber/container.py b/pdfplumber/container.py
@@ -27,10 +27,6 @@ def curves(self):
     def images(self):
         return self.objects.get("image", [])
 
-    @property
-    def figures(self):
-        return self.objects.get("figure", [])
-
     @property
     def chars(self):
         return self.objects.get("char", [])

diff --git a/pdfplumber/page.py b/pdfplumber/page.py
@@ -161,6 +161,11 @@ def str_conv(x):
         CONVERSIONS_KEYS = set(CONVERSIONS.keys())
 
         def process_object(obj):
+            if hasattr(obj, "_objs"):
+                for child in obj._objs:
+                    process_object(child)
+                return
+
             attr = dict(
                 (k, CONVERSIONS[k](resolve_all(v)))
                 for k, v in obj.__dict__.items()
@@ -191,10 +196,6 @@ def process_object(obj):
                 objects[kind] = []
             objects[kind].append(attr)
 
-            if hasattr(obj, "_objs"):
-                for child in obj._objs:
-                    process_object(child)
-
         for obj in self.layout._objs:
             process_object(obj)
 

diff --git a/tests/test_ca_warn_report.py b/tests/test_ca_warn_report.py
@@ -33,7 +33,6 @@ def test_page_limiting(self):
     def test_objects(self):
         assert len(self.pdf.chars)
         assert len(self.pdf.rects)
-        assert len(self.pdf.figures)
         assert len(self.pdf.images)
 
     def test_parse(self):