Skip to content

Commit 2ed84f5

Browse files
ashbpotiuk
andauthored
Switch LGPL'd chardet for MIT licensed charset_normalizer (#5797)
Although using the (non-vendored) chardet library is fine for requests itself, but using a LGPL dependency the story is a lot less clear for downstream projects, particularly ones that might like to bundle requests (and thus chardet) in to a single binary -- think something similar to what docker-compose is doing. By including an LGPL'd module it is no longer clear if the resulting artefact must also be LGPL'd. By changing out this dependency for one under MIT we remove all license ambiguity. As an "escape hatch" I have made the code so that it will use chardet first if it is installed, but we no longer depend upon it directly, although there is a new extra added, `requests[lgpl]`. This should minimize the impact to users, and give them an escape hatch if charset_normalizer turns out to be not as good. (In my non-exhaustive tests it detects the same encoding as chartdet in every case I threw at it) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
1 parent 33d448e commit 2ed84f5

File tree

10 files changed

+119
-27
lines changed

10 files changed

+119
-27
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,12 @@ env/
2323

2424
.workon
2525

26+
# in case you work with IntelliJ/PyCharm
27+
.idea
28+
*.iml
29+
.python-version
30+
31+
2632
t.py
2733

2834
t2.py

HISTORY.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,22 @@ dev
66

77
- \[Short description of non-trivial change.\]
88

9+
**Dependencies**
10+
11+
- Instead of `chardet`, use the MIT-licensed `charset_normalizer` for Python3
12+
to remove license ambiguity for projects bundling requests. If `chardet`
13+
is already installed on your machine it will be used instead of `charset_normalizer`
14+
to keep backwards compatibility.
15+
16+
You can also install `chardet` while installing requests by
17+
specifying `[use_chardet_on_py3]` extra as follows:
18+
19+
```shell
20+
pip install "requests[use_chardet_on_py3]"
21+
```
22+
23+
Python2 still depends upon the `chardet` module.
24+
925
2.25.1 (2020-12-16)
1026
-------------------
1127

@@ -1707,4 +1723,3 @@ This is not a backwards compatible change.
17071723
17081724
- Frustration
17091725
- Conception
1710-

docs/user/advanced.rst

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -697,10 +697,22 @@ Encodings
697697
When you receive a response, Requests makes a guess at the encoding to
698698
use for decoding the response when you access the :attr:`Response.text
699699
<requests.Response.text>` attribute. Requests will first check for an
700-
encoding in the HTTP header, and if none is present, will use `chardet
701-
<https://pypi.org/project/chardet/>`_ to attempt to guess the encoding.
702-
703-
The only time Requests will not do this is if no explicit charset
700+
encoding in the HTTP header, and if none is present, will use
701+
`charset_normalizer <https://pypi.org/project/charset_normalizer/>`_
702+
or `chardet <https://github.com/chardet/chardet>`_ to attempt to
703+
guess the encoding.
704+
705+
If ``chardet`` is installed, ``requests`` uses it, however for python3
706+
``chardet`` is no longer a mandatory dependency. The ``chardet``
707+
library is an LGPL-licenced dependency and some users of requests
708+
cannot depend on mandatory LGPL-licensed dependencies.
709+
710+
When you install ``request`` without specifying ``[use_chardet_on_py3]]`` extra,
711+
and ``chardet`` is not already installed, ``requests`` uses ``charset-normalizer``
712+
(MIT-licensed) to guess the encoding. For Python 2, ``requests`` uses only
713+
``chardet`` and is a mandatory dependency there.
714+
715+
The only time Requests will not guess the encoding is if no explicit charset
704716
is present in the HTTP headers **and** the ``Content-Type``
705717
header contains ``text``. In this situation, `RFC 2616
706718
<https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_ specifies

requests/__init__.py

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,20 @@
4141
"""
4242

4343
import urllib3
44-
import chardet
4544
import warnings
4645
from .exceptions import RequestsDependencyWarning
4746

47+
try:
48+
from charset_normalizer import __version__ as charset_normalizer_version
49+
except ImportError:
50+
charset_normalizer_version = None
4851

49-
def check_compatibility(urllib3_version, chardet_version):
52+
try:
53+
from chardet import __version__ as chardet_version
54+
except ImportError:
55+
chardet_version = None
56+
57+
def check_compatibility(urllib3_version, chardet_version, charset_normalizer_version):
5058
urllib3_version = urllib3_version.split('.')
5159
assert urllib3_version != ['dev'] # Verify urllib3 isn't installed from git.
5260

@@ -62,12 +70,19 @@ def check_compatibility(urllib3_version, chardet_version):
6270
assert minor >= 21
6371
assert minor <= 26
6472

65-
# Check chardet for compatibility.
66-
major, minor, patch = chardet_version.split('.')[:3]
67-
major, minor, patch = int(major), int(minor), int(patch)
68-
# chardet >= 3.0.2, < 5.0.0
69-
assert (3, 0, 2) <= (major, minor, patch) < (5, 0, 0)
70-
73+
# Check charset_normalizer for compatibility.
74+
if chardet_version:
75+
major, minor, patch = chardet_version.split('.')[:3]
76+
major, minor, patch = int(major), int(minor), int(patch)
77+
# chardet_version >= 3.0.2, < 5.0.0
78+
assert (3, 0, 2) <= (major, minor, patch) < (5, 0, 0)
79+
elif charset_normalizer_version:
80+
major, minor, patch = charset_normalizer_version.split('.')[:3]
81+
major, minor, patch = int(major), int(minor), int(patch)
82+
# charset_normalizer >= 2.0.0 < 3.0.0
83+
assert (2, 0, 0) <= (major, minor, patch) < (3, 0, 0)
84+
else:
85+
raise Exception("You need either charset_normalizer or chardet installed")
7186

7287
def _check_cryptography(cryptography_version):
7388
# cryptography < 1.3.4
@@ -82,10 +97,10 @@ def _check_cryptography(cryptography_version):
8297

8398
# Check imported dependencies for compatibility.
8499
try:
85-
check_compatibility(urllib3.__version__, chardet.__version__)
100+
check_compatibility(urllib3.__version__, chardet_version, charset_normalizer_version)
86101
except (AssertionError, ValueError):
87-
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
88-
"version!".format(urllib3.__version__, chardet.__version__),
102+
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
103+
"version!".format(urllib3.__version__, chardet_version, charset_normalizer_version),
89104
RequestsDependencyWarning)
90105

91106
# Attempt to enable urllib3's fallback for SNI support

requests/compat.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,10 @@
88
Python 3.
99
"""
1010

11-
import chardet
11+
try:
12+
import chardet
13+
except ImportError:
14+
import charset_normalizer as chardet
1215

1316
import sys
1417

requests/help.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,19 @@
88

99
import idna
1010
import urllib3
11-
import chardet
1211

1312
from . import __version__ as requests_version
1413

14+
try:
15+
import charset_normalizer
16+
except ImportError:
17+
charset_normalizer = None
18+
19+
try:
20+
import chardet
21+
except ImportError:
22+
chardet = None
23+
1524
try:
1625
from urllib3.contrib import pyopenssl
1726
except ImportError:
@@ -71,7 +80,12 @@ def info():
7180

7281
implementation_info = _implementation()
7382
urllib3_info = {'version': urllib3.__version__}
74-
chardet_info = {'version': chardet.__version__}
83+
charset_normalizer_info = {'version': None}
84+
chardet_info = {'version': None}
85+
if charset_normalizer:
86+
charset_normalizer_info = {'version': charset_normalizer.__version__}
87+
if chardet:
88+
chardet_info = {'version': chardet.__version__}
7589

7690
pyopenssl_info = {
7791
'version': None,
@@ -99,9 +113,11 @@ def info():
99113
'implementation': implementation_info,
100114
'system_ssl': system_ssl_info,
101115
'using_pyopenssl': pyopenssl is not None,
116+
'using_charset_normalizer': chardet is None,
102117
'pyOpenSSL': pyopenssl_info,
103118
'urllib3': urllib3_info,
104119
'chardet': chardet_info,
120+
'charset_normalizer': charset_normalizer_info,
105121
'cryptography': cryptography_info,
106122
'idna': idna_info,
107123
'requests': {

requests/models.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -731,7 +731,7 @@ def next(self):
731731

732732
@property
733733
def apparent_encoding(self):
734-
"""The apparent encoding, provided by the chardet library."""
734+
"""The apparent encoding, provided by the charset_normalizer or chardet libraries."""
735735
return chardet.detect(self.content)['encoding']
736736

737737
def iter_content(self, chunk_size=1, decode_unicode=False):
@@ -845,7 +845,7 @@ def text(self):
845845
"""Content of the response, in unicode.
846846
847847
If Response.encoding is None, encoding will be guessed using
848-
``chardet``.
848+
``charset_normalizer`` or ``chardet``.
849849
850850
The encoding of the response content is determined based solely on HTTP
851851
headers, following RFC 2616 to the letter. If you can take advantage of
@@ -893,7 +893,7 @@ def json(self, **kwargs):
893893
if not self.encoding and self.content and len(self.content) > 3:
894894
# No encoding set. JSON RFC 4627 section 3 states we should expect
895895
# UTF-8, -16 or -32. Detect which one to use; If the detection or
896-
# decoding fails, fall back to `self.text` (using chardet to make
896+
# decoding fails, fall back to `self.text` (using charset_normalizer to make
897897
# a best guess).
898898
encoding = guess_json_utf(self.content)
899899
if encoding is not None:

requests/packages.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,26 @@
11
import sys
22

3+
try:
4+
import chardet
5+
except ImportError:
6+
import charset_normalizer as chardet
7+
import warnings
8+
9+
warnings.filterwarnings('ignore', 'Trying to detect', module='charset_normalizer')
10+
311
# This code exists for backwards compatibility reasons.
412
# I don't like it either. Just look the other way. :)
513

6-
for package in ('urllib3', 'idna', 'chardet'):
14+
for package in ('urllib3', 'idna'):
715
locals()[package] = __import__(package)
816
# This traversal is apparently necessary such that the identities are
917
# preserved (requests.packages.urllib3.* is urllib3.*)
1018
for mod in list(sys.modules):
1119
if mod == package or mod.startswith(package + '.'):
1220
sys.modules['requests.packages.' + mod] = sys.modules[mod]
1321

22+
target = chardet.__name__
23+
for mod in list(sys.modules):
24+
if mod == target or mod.startswith(target + '.'):
25+
sys.modules['requests.packages.' + target.replace(target, 'chardet')] = sys.modules[mod]
1426
# Kinda cool, though, right?

setup.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ def run_tests(self):
4141
packages = ['requests']
4242

4343
requires = [
44-
'chardet>=3.0.2,<5',
44+
'charset_normalizer~=2.0.0; python_version >= "3"',
45+
'chardet>=3.0.2,<5; python_version < "3"',
4546
'idna>=2.5,<3',
4647
'urllib3>=1.21.1,<1.27',
4748
'certifi>=2017.4.17'
@@ -103,6 +104,7 @@ def run_tests(self):
103104
'security': ['pyOpenSSL >= 0.14', 'cryptography>=1.3.4'],
104105
'socks': ['PySocks>=1.5.6, !=1.5.7'],
105106
'socks:sys_platform == "win32" and python_version == "2.7"': ['win_inet_pton'],
107+
'use_chardet_on_py3': ['chardet>=3.0.2,<5']
106108
},
107109
project_urls={
108110
'Documentation': 'https://requests.readthedocs.io',

tox.ini

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
[tox]
2-
envlist = py27,py35,py36,py37,py38
2+
envlist = py{27,35,36,37,38}-{default,use_chardet_on_py3}
33

44
[testenv]
5-
5+
deps = -rrequirements-dev.txt
6+
extras =
7+
security
8+
socks
69
commands =
7-
python setup.py test
10+
pytest tests
11+
12+
[testenv:default]
13+
14+
[testenv:use_chardet_on_py3]
15+
extras =
16+
security
17+
socks
18+
use_chardet_on_py3

0 commit comments

Comments
 (0)