Skip to content

Commit

Permalink
Add char['matrix'], ctm submodule, and CTM class
Browse files Browse the repository at this point in the history
  • Loading branch information
jsvine committed May 10, 2022
1 parent 3191c25 commit ae6f99e
Show file tree
Hide file tree
Showing 6 changed files with 98 additions and 2 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/).

## [unreleased]

### Added

- Add `"matrix"` property to `char` objects, representing the current transformation matrix.
- Add `pdfplumber.ctm` submodule with class `CTM`, to calculate scale, skew, and translation of the current transformation matrix.

## [0.6.2] - 2022-05-06

### Added
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,18 @@ Each object is represented as a simple Python `dict`, with the following propert
|`top`| Distance of top of character from top of page.|
|`bottom`| Distance of bottom of the character from top of page.|
|`doctop`| Distance of top of character from top of document.|
|`matrix`| The "current transformation matrix" for this character. (See below for details.)|
|`object_type`| "char"|

__Note__: A character’s `matrix` property represents the “current transformation matrix,” as described in Section 4.2.2 of the [PDF Reference](https://ghostscript.com/~robin/pdf_reference17.pdf) (6th Ed.). The matrix controls the character’s scale, skew, and positional translation. Rotation is a combination of scale and skew, but in most cases can be considered equal to the x-axis skew. The `pdfplumber.ctm` submodule defines a class, `CTM`, that assists with these calculations. For instance:

```python
from pdfplumber.ctm import CTM
my_char = pdf.pages[0].chars[3]
my_char_ctm = CTM(*my_char["matrix"])
my_char_rotation = my_char_ctm.skew_x
```

#### `line` properties

| Property | Description |
Expand Down
38 changes: 38 additions & 0 deletions pdfplumber/ctm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import math
from typing import NamedTuple

# For more details, see the PDF Reference, 6th Ed., Section 4.2.2 ("Common
# Transformations")


class CTM(NamedTuple):
a: float
b: float
c: float
d: float
e: float
f: float

@property
def scale_x(self) -> float:
return math.sqrt(pow(self.a, 2) + pow(self.b, 2))

@property
def scale_y(self) -> float:
return math.sqrt(pow(self.c, 2) + pow(self.d, 2))

@property
def skew_x(self) -> float:
return (math.atan2(self.d, self.c) * 180 / math.pi) - 90

@property
def skew_y(self) -> float:
return math.atan2(self.b, self.a) * 180 / math.pi

@property
def translation_x(self) -> float:
return self.e

@property
def translation_y(self) -> float:
return self.f
2 changes: 2 additions & 0 deletions pdfplumber/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
"y0",
"y1",
"bits",
"matrix",
"upright",
"font",
"fontname",
Expand Down Expand Up @@ -200,6 +201,7 @@ def process_attr(item: Tuple[str, Any]) -> Optional[Tuple[str, Any]]:
gs = obj.graphicstate
attr["stroking_color"] = gs.scolor
attr["non_stroking_color"] = gs.ncolor
attr["matrix"] = attr["matrix"]

if isinstance(obj, LTCurve) and not isinstance(obj, (LTRect, LTLine)):

Expand Down
6 changes: 4 additions & 2 deletions tests/test_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ def test_csv(self):
c = self.pdf.to_csv(precision=3)
assert c.split("\r\n")[9] == (
"char,1,45.83,58.826,656.82,674.82,117.18,117.18,135.18,12.996,"
'18.0,12.996,,,,,,TimesNewRomanPSMT,,,,"(0, 0, 0)",,,18.0,,,,,Y,,1,'
'18.0,12.996,,,,,,TimesNewRomanPSMT,,,"(1, 0, 0, 1, 45.83, 660.69)"'
',,"(0, 0, 0)",,,18.0,,,,,Y,,1,'
)

io = StringIO()
Expand Down Expand Up @@ -111,7 +112,8 @@ def test_cli_csv(self):

assert res.decode("utf-8").split("\r\n")[9] == (
"char,1,45.83,58.826,656.82,674.82,117.18,117.18,135.18,12.996,"
'18.0,12.996,,,,,,TimesNewRomanPSMT,,,,"(0, 0, 0)",,,18.0,,,,,Y,,1,'
'18.0,12.996,,,,,,TimesNewRomanPSMT,,,"(1, 0, 0, 1, 45.83, 660.69)"'
',,"(0, 0, 0)",,,18.0,,,,,Y,,1,'
)

def test_page_to_dict(self):
Expand Down
37 changes: 37 additions & 0 deletions tests/test_ctm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env python
import os
import unittest

import pdfplumber
from pdfplumber.ctm import CTM

HERE = os.path.abspath(os.path.dirname(__file__))


class Test(unittest.TestCase):
def test_pdffill_demo(self):
path = os.path.join(HERE, "pdfs/pdffill-demo.pdf")
pdf = pdfplumber.open(path)
left_r = pdf.pages[3].chars[97]
right_r = pdf.pages[3].chars[105]

left_ctm = CTM(*left_r["matrix"])
right_ctm = CTM(*right_r["matrix"])

assert round(left_ctm.translation_x) == 126
assert round(right_ctm.translation_x) == 372

assert round(left_ctm.translation_y) == 519
assert round(right_ctm.translation_y) == 562

assert left_ctm.skew_x == 45
assert right_ctm.skew_x == -45

assert left_ctm.skew_y == 45
assert right_ctm.skew_y == -45

assert round(left_ctm.scale_x, 3) == 1
assert round(right_ctm.scale_x, 3) == 1

assert round(left_ctm.scale_y, 3) == 1
assert round(right_ctm.scale_y, 3) == 1

0 comments on commit ae6f99e

Please sign in to comment.