Skip to content

Commit

Permalink
Use charset-normalizer instead of chardet (#744)
Browse files Browse the repository at this point in the history
* Use charset-normalizer instead of chardet

* Ignore charset_normalizer type stub

* Add CHANGELOG.md
  • Loading branch information
pietermarsman authored Apr 20, 2022
1 parent 617e4c8 commit 1bf3c42
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 6 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- Exporting images without any specific encoding ([#737](https://github.com/pdfminer/pdfminer.six/pull/737))

### Changed

- Using charset-normalizer instead of chardet for less restrictive license ([#744](https://github.com/pdfminer/pdfminer.six/pull/744))

## [20220319]

### Added
Expand Down
7 changes: 5 additions & 2 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ ignore_missing_imports = True
[mypy-pytest.*]
ignore_missing_imports = True

[mypy-setuptools]
[mypy-setuptools.*]
ignore_missing_imports = True

[mypy-nox]
[mypy-nox.*]
ignore_missing_imports = True

[mypy-charset_normalizer.*]
ignore_missing_imports = True
4 changes: 2 additions & 2 deletions pdfminer/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
if TYPE_CHECKING:
from .layout import LTComponent

import chardet # For str encoding detection
import charset_normalizer # For str encoding detection

# from sys import maxint as INF doesn't work anymore under Python3, but PDF
# still uses 32 bits ints
Expand Down Expand Up @@ -75,7 +75,7 @@ def make_compat_bytes(in_str: str) -> bytes:
def make_compat_str(o: object) -> str:
"""Converts everything to string, if bytes guessing the encoding."""
if isinstance(o, bytes):
enc = chardet.detect(o)
enc = charset_normalizer.detect(o)
try:
return o.decode(enc["encoding"])
except UnicodeDecodeError:
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
packages=["pdfminer"],
package_data={"pdfminer": ["cmap/*.pickle.gz", "py.typed"]},
install_requires=[
'chardet ; python_version > "3.0"',
"cryptography",
"charset-normalizer~=2.0.0",
"cryptography~=36.0.0",
],
extras_require={
"dev": ["pytest", "nox", "black", "mypy == 0.931"],
Expand Down

0 comments on commit 1bf3c42

Please sign in to comment.