Skip to content

Commit

Permalink
Remove reference/access to .figures
Browse files Browse the repository at this point in the history
Per diagram and explanation in [1], LTFigure is just a layout-estimated
collation of LTCurve objects. For consistency with the rest of
pdfplumber, removing it from the list of parsed/accessible objects,
although we still do process the curves within each figure.

[1] https://github.com/pdfminer/pdfminer.six/blob/develop/docs/source/topic/converting_pdf_to_text.rst
  • Loading branch information
jsvine committed Aug 26, 2020
1 parent a74d3bc commit 8e74cb9
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 15 deletions.
7 changes: 1 addition & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ The `pdfplumber.Page` class is at the core of `pdfplumber`. Most things you'll d
|`.page_number`| The sequential page number, starting with `1` for the first page, `2` for the second, and so on.|
|`.width`| The page's width.|
|`.height`| The page's height.|
|`.objects` / `.chars` / `.lines` / `.rects` / `.curves` / `.figures` / `.images`| Each of these properties is a list, and each list contains one dictionary for each such object embedded on the page. For more detail, see "[Objects](#objects)" below.|
|`.objects` / `.chars` / `.lines` / `.rects` / `.curves` / `.images`| Each of these properties is a list, and each list contains one dictionary for each such object embedded on the page. For more detail, see "[Objects](#objects)" below.|

... and these main methods:

Expand All @@ -113,7 +113,6 @@ Each instance of `pdfplumber.PDF` and `pdfplumber.Page` provides access to four
- `.rects`, each representing a single 2-dimensional rectangle.
- `.curves`, each representing a series of connected points.
- `.images`, each representing an image.
- `.figures`, each representing a figure.
- `.annots`, each representing a single PDF annotation (cf. Section 8.4 of the [official PDF specification](https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf) for details)
- `.hyperlinks`, each representing a single PDF annotation of the subtype `Link` and having an `URI` action attribute

Expand Down Expand Up @@ -198,10 +197,6 @@ Additionally, both `pdfplumber.PDF` and `pdfplumber.Page` provide access to two

[To be completed.]

#### `figure` properties

[To be completed.]

## Visual debugging

__Note:__ To use `pdfplumber`'s visual-debugging tools, you'll also need to have two additional pieces of software installed on your computer:
Expand Down
4 changes: 0 additions & 4 deletions pdfplumber/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,6 @@ def curves(self):
def images(self):
return self.objects.get("image", [])

@property
def figures(self):
return self.objects.get("figure", [])

@property
def chars(self):
return self.objects.get("char", [])
Expand Down
9 changes: 5 additions & 4 deletions pdfplumber/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,11 @@ def str_conv(x):
CONVERSIONS_KEYS = set(CONVERSIONS.keys())

def process_object(obj):
if hasattr(obj, "_objs"):
for child in obj._objs:
process_object(child)
return

attr = dict(
(k, CONVERSIONS[k](resolve_all(v)))
for k, v in obj.__dict__.items()
Expand Down Expand Up @@ -191,10 +196,6 @@ def process_object(obj):
objects[kind] = []
objects[kind].append(attr)

if hasattr(obj, "_objs"):
for child in obj._objs:
process_object(child)

for obj in self.layout._objs:
process_object(obj)

Expand Down
1 change: 0 additions & 1 deletion tests/test_ca_warn_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ def test_page_limiting(self):
def test_objects(self):
assert len(self.pdf.chars)
assert len(self.pdf.rects)
assert len(self.pdf.figures)
assert len(self.pdf.images)

def test_parse(self):
Expand Down

0 comments on commit 8e74cb9

Please sign in to comment.