Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

Closed
AdrianBunk opened this issue Feb 22, 2023 · 37 comments
Closed

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

AdrianBunk opened this issue Feb 22, 2023 · 37 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@AdrianBunk
Copy link

AdrianBunk commented Feb 22, 2023

Everyone (including the submitter of the now public exploit who submitted the issue half a year ago to security@python.org and the NVD) seems to think that #99421 "accidently fixed" CVE-2023-24329.

Did the Python Security Response Team verify that this vulnerability that was reported to them and that is now public was fixed by #99421?

The PoC from the submitter still works for me with the Debian package 3.11.2-4, which surprised me and makes me wonder whether the fix had any effect at all on the stripping of leading blanks issue in the CVE.

Linked PRs

@AdrianBunk AdrianBunk added the type-bug An unexpected behavior, bug, or error label Feb 22, 2023
@hugovk hugovk added the type-security A security issue label Feb 22, 2023
@ned-deily
Copy link
Member

@pablogsal

@pablogsal
Copy link
Member

The backport was merged here #99446 no?

@AdrianBunk
Copy link
Author

@pablogsal #99446 is a backport of #99421 that does not seem to fix CVE-2023-24329:

$ cat test.py 
import urllib.request
from urllib.parse import urlparse
def safeURLOpener(inputLink):
    block_host = ["instagram.com", "youtube.com", "tiktok.com", "example.com"]
    input_hostname = urlparse(inputLink).hostname
    if input_hostname in block_host:
        print("input hostname is forbidden")
        return
    target = urllib.request.urlopen(inputLink)
    content = target.read()
    print(content)

safeURLOpener("https://example.com")
safeURLOpener(" https://example.com")  # CVE-2023-24329
safeURLOpener("+https://example.com")  # 99421
$ python3.10 test.py 
input hostname is forbidden
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
input hostname is forbidden
$ python3.11 test.py 
input hostname is forbidden
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
Traceback (most recent call last):
  File "/tmp/test.py", line 15, in <module>
    safeURLOpener("+https://example.com")  # 99421
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/test.py", line 9, in safeURLOpener
    target = urllib.request.urlopen(inputLink)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 541, in _open
    return self._call_chain(self.handle_open, 'unknown',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 1419, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: +https>
$

@pablogsal
Copy link
Member

CC: @gpshead

@gpshead gpshead self-assigned this Feb 24, 2023
@arhadthedev
Copy link
Member

arhadthedev commented Feb 27, 2023

@CharlieZhao95
Copy link
Contributor

BTW, does this patch (CVE-2023-24329) require a backport to 3.10, 3.9 and older branches?

I noticed that the bug is currently only backported to the 3.11 branch, but it actually affects all versions prior to 3.11.

@RSAlderman
Copy link

@CharlieZhao95 that's what I was asking for in #102293 - backport of the security vulnerability fix for CVE-2023-24329 to all in-service releases (3.7-3.10).

The request for the backports has been closed as a duplicate of this issue by @gpshead

@CharlesBryant-G
Copy link

Maybe it's worth taking a step back and looking at the problem in a wider context.

In the PoC, the vulnerability arises not because parse() returns the wrong answer, but because it interprets the url differently from urlopen(). If they were both wrong in the same way it would be harmless. Why is there more than one piece of code which parses URLs? The DRY principle should apply.

Closely related, note that urlparse() does not have a vulnerability at all - any vulnerability is in code which relies on it and does so in a way in which it creates a vulnerability. In the PoC, the vulnerability is in the code for safeURLOpener().

The way in which urlparse() is implemented is fragile and bug prone. As a general principle, parsing code should not look ahead for known delimiters, it should systematically work from the start, advancing over characters tested to be legitimate. So
urlparse('example.com@!$%^&*()_+-={}[]:;"\\|?query#frag')
should stop parsing at the '%' as that is not a legal character when not followed by two hex digits. It may return that "example.com@!$" is the path and there are extra characters after the URL (this style of parsing is often convenient when parsing items which may contain things to be parsed), or report failure due to an invalid URL. Instead, an early stage of processing skips ahead to the '?' and '#', so it claims there is a path, query, and fragment. While it could then validate these pieces and realise that the path is invalid, this can be forgotten and makes it unnecessarily difficult and dangerous to make the parser accept a valid URL followed by other characters (because it would need to reliably undo any parsing of anything past the valid part).

@gpshead gpshead changed the title Is CVE-2023-24329 still unfixed in 3.11.2? urllib.parse CVE-2023-24329 appears unfixed Mar 1, 2023
@gpshead
Copy link
Member

gpshead commented Mar 1, 2023

We will backport something that makes sense if we determine this is a security issue, that's why I duped the other issue here. Backporting the existing commit further does not make sense to me until the leading space issue, if present as reported here, is resolved. (I haven't taken the time to look. this is not an emergency)

@xiaoge1001
Copy link

xiaoge1001 commented Mar 6, 2023

>>> from urllib.parse import urlparse
>>> urlparse(" https://example.com")
ParseResult(scheme='', netloc='', path=' https://example.com', params='', query='', fragment='')

I tested it and the problem doesn't seem to be fixed. I execute urlparse(" https://example.com"), the output before and after merging #99421 is the same.

@xiaoge1001
Copy link

xiaoge1001 commented Mar 6, 2023

CVE-2023-24329 says that supplying a URL that starts with blank characters is bad.

If a URL-scheme is " https", it will jump out of the loop in the following code:

if c not in scheme_chars:

After #99421 is merged, it will exit early:

if i > 0 and url[0].isascii() and url[0].isalpha():

The code in line 468 is not executed before and after the modification, the subsequent code execution will not change:
image

when input a URL that starts with blank characters,#99421 doesn't seem to have no effect.

@xiaoge1001
Copy link

@gpshead Hello, can you review this pull #102470 ?

@xiaoge1001
Copy link

xiaoge1001 commented Mar 7, 2023

https://nvd.nist.gov/vuln/detail/CVE-2023-24329

Base Score: [7.5 HIGH]
vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N

This is a high-score vulnerability. Can we fix it as soon as possible? If we don't think it's a vulnerability, can we reject it?

@illia-v
Copy link
Contributor

illia-v commented Mar 7, 2023

BTW, in JS both leading and trailing space and C0 control are removed by URL parsers.

https://url.spec.whatwg.org/#url-parsing

@xiaoge1001
Copy link

xiaoge1001 commented Mar 7, 2023

image
For URLs with leading whitespace, ruby throws InvalidURIError.

illia-v added a commit to illia-v/cpython that referenced this issue Mar 7, 2023
illia-v added a commit to illia-v/cpython that referenced this issue Mar 8, 2023
@gpshead gpshead added the stdlib Python modules in the Lib dir label Mar 9, 2023
@gpshead gpshead changed the title urllib.parse CVE-2023-24329 appears unfixed urllib.parse space handling CVE-2023-24329 appears unfixed Mar 9, 2023
@gpshead
Copy link
Member

gpshead commented Mar 9, 2023

Adding some historical context - caution, this is long, but worth understanding:

In August 2022 a discussion about this issue was spawned in private on the Python security response team mailing list after the initial report from Yebo Cao came in. Our intent was to file a public issue about it and continue discussion from there, but that never happened. So lets consider this issue to be that one... thanks for filing it!

In the private discussions, which are longer than I want to paste so I'll summarize some points from that, and paste a bunch of bits of other parts:

  • For security fixes we value stability very highly. We are very cautious about changing the behavior of API that may break existing code using it on already deployed Python version that receive a security patch backports. https://www.hyrumslaw.com/ very much applies.
  • We agree that agree this urlparse behavior is unexpected vs modern sensabilities.
  • The standard library urllib functions are inconsistent in their behavior. For example urlopen and urlparse do not always behave the same / urlopen does not always call urlparse. (uhoh)
  • urllib and related modules are VERY old crufty code. If you dig, expect that much of it comes from the mid 1990s, before RFCs on the subject were well defined and long before WHATWG was even a thing let alone widely accepted as the real standard borne of practical experience.
  • From looking over old Python issues it's been observed that: "users do not (and never really) expect RFC behavior here. What they expect is an implementation of the WHATWG url parsing standard (which is to say, they expect urlparse to behave like a browser does): https://url.spec.whatwg.org/"
  • It is fair to say that this part of the standard library is unowned and undermaintained. Nobody wants to own it and this post should make it apparent as to why: Compatibility constraints mean it remains a legacy behavior mess.

cc: @PaulMcMillan who did a lot of the above and below analysis last year.

Paul came up with a nice looking list of urlparse potential test cases and demonstrated their current behavior as of August 2022 here https://gist.github.com/PaulMcMillan/70618ca857a0519379af704d88a1c9af as part of the analysis. (even if some of those have changed with other fixes since, the ones with the spaces in what should've been the scheme do not appear to have - I haven't checked all of those behaviors across time and versions).

URL Schemes:

RFC 1808 2.1 specifies that schemes are named as per RFC1738 section 2.1 which says:

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed.

This seems to reasonably imply that "scheme" should raise a ValueError if other characters (e.g. spaces) are included.

BUT... The tricky question for the Python stdlib is "do we have a scheme with invalid characters?" vs. our fallback of "Otherwise the input is presumed to be a relative URL and thus to start with a path component." per our long standing documented and implemented urllib.parse API. (pause for audience groans...) But the documented API is talking about the netloc there, not the scheme.

What users think they want is: "A string containing :// will try to parse everything to the left of : as the scheme."

However, this doesn't work. If you look at actual user behavior, they have a tendency to do things like put unencoded urls in the query, or possibly even the path, and it doesn't seem to make sense to break schema-less parsing in that case.

Existing user code depends on that API behavior.

Another past issue demonstrating similar problems with changing the behavior of the urlparse() APIs is the past joy of netloc's containing port numbers. See #661 (and its linked to issues)

Leaving it unconclusive that there's much we can do about this.

@PaulMcMillan had one suggestion, perhaps a more heuristic approach: If a urlcontains a ://, split on that, if the left hand side of that does not have a / in it, assume it was supposed to be a scheme and raise a ValueError if it contains any invalid scheme characters. Essentially adding a "what you have looks enough like a url that we're pretty sure your schema is invalid" check here:

if c not in scheme_chars:

We're not sure if this edge case is worth the backwards compatibility change since it doesn't cover a myriad of other ways urllib.parse.urlparse() differs from browsers.

Further discussion covered questions such as "can't we just rstrip() or lstrip() or strip() always?", noting other use patterns and reasons why change here in complicated in practice:

  • rstrip vs trailing spaces isn't so easy as those can be meaningful in path names.
  • current behavior in path-less urls such as "scheme://foo.example " where urlopen() works today but parsing that with urlparse() the trailing space winds up in the netloc which will defeat similar blocklist style string matches.
  • people are likely to use urlparse(), but then use requests or other libraries instead of urlopen(). requests accepts a prepended space, but raises an error in the prior trailing space in netloc example...
  • urlopen()'s stripping behavior is possibly unintentional as a side effect its own very messy internals.

At a minimum we should document what happens with invalid schemes. Perhaps we should be recommending better fully WHATWG compliant PyPI maintained libraries. Ideally we'd have shipped one. But we don't today and changing our existing urlparse API to behave that way will break existing users so it is not a security fix... it'd need to be a thoughtful breaking change API transition with a behavior deprecation period and recommendations for code needing any of the old behaviors. Not a security fix.

(The bulk of the above analysis and a bunch of the words come from Paul and some from Guido. I'm opening them up here for a wider audience, I added or rephrased or editorialized and emphasized a few things along the way.)

@xiaoge1001
Copy link

xiaoge1001 commented Mar 9, 2023

@gpshead Thank you for reply.

So there's a fix plan at the moment for the main branch? Because this is a high-score vulnerability, I hope it can be fixed as soon as possible.

@AdrianBunk
Copy link
Author

@gpshead Thanks for the explanations. In my opinion Python does in recent years already break/remove far too much existing functionality, and I am pleasantly surprised by your awareness for maintaining backwards compatibility.

In addition to the technical side you explained, there is also a process problem you should discuss (perhaps not in public) in case you aren't already doing this:

Right now there is a CVE with a high score links to a description of a vulnerability with a PoC and a merge request with a one-line fix - but the PoC still works with the fix. Something went wrong that resulted in people wrongly thinking that #99421 would fix CVE-2023-24329, and for that it is not even relevant whether the final resolution will be a code fix or a documentation update that this is not considered a bug.

Our intent was to file a public issue about it and continue discussion from there, but that never happened.

Apparently:

  • someone (or noone?) was supposed to do this, but
  • this never happened, and
  • there might be a lack of internal tracking that would detect such overdue tasks?

potiuk added a commit to potiuk/airflow that referenced this issue Jun 7, 2023
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.
encukou added a commit to encukou/cpython that referenced this issue Jun 7, 2023
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)
`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.
This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).
Backported from Python 3.12
potiuk added a commit to apache/airflow that referenced this issue Jun 7, 2023
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.
eladkal pushed a commit to apache/airflow that referenced this issue Jun 8, 2023
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.

(cherry picked from commit 87c5c9f)
carlosroman added a commit to DataDog/cpython that referenced this issue Jun 22, 2023
* Post 3.8.16

* [3.8] Update copyright years to 2023. (pythongh-100852)

* [3.8] Update copyright years to 2023. (pythongh-100848).
(cherry picked from commit 11f9932)

Co-authored-by: Benjamin Peterson <benjamin@python.org>

* Update additional copyright years to 2023.

Co-authored-by: Ned Deily <nad@python.org>

* [3.8] Update copyright year in README (pythonGH-100863) (pythonGH-100867)

(cherry picked from commit 30a6cc4)

Co-authored-by: Ned Deily <nad@python.org>
Co-authored-by: HARSHA VARDHAN <75431678+Thunder-007@users.noreply.github.com>

* [3.8] Correct CVE-2020-10735 documentation (pythonGH-100306) (python#100698)

(cherry picked from commit 1cf3d78)
(cherry picked from commit 88fe8d7)

Co-authored-by: Jeremy Paige <ucodery@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>

* [3.8] Bump Azure Pipelines to ubuntu-22.04 (pythonGH-101089) (python#101215)

(cherry picked from commit c22a55c)

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>

* [3.8] pythongh-100180: Update Windows installer to OpenSSL 1.1.1s (pythonGH-100903) (python#101258)

* pythongh-101422: (docs) TarFile default errorlevel argument is 1, not 0 (pythonGH-101424)

(cherry picked from commit ea23271)

Co-authored-by: Owain Davies <116417456+OTheDev@users.noreply.github.com>

* [3.8] pythongh-95778: add doc missing in some places (pythonGH-100627) (python#101630)

(cherry picked from commit 4652182)

* [3.8] pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) (python#101710)

Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>

* [3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI i… (python#102095)

[3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI issue (pythongh-102079)

* [3.8] pythonGH-102306 Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK (pythonGH-102307)

[3.8] Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK.

* [3.8] pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727) (pythonGH-101752)

Fixes CVE-2023-0286 (High) and a couple of Medium security issues.
https://www.openssl.org/news/secadv/20230207.txt

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Ned Deily <nad@python.org>

* [3.8] pythongh-102627: Replace address pointing toward malicious web page (pythonGH-102630) (pythonGH-102667)

(cherry picked from commit 61479d4)

Co-authored-by: Blind4Basics <32236948+Blind4Basics@users.noreply.github.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>

* [3.8] pythongh-101997: Update bundled pip version to 23.0.1 (pythonGH-101998). (python#102244)

(cherry picked from commit 89d9ff0)

* [3.8] pythongh-102950: Implement PEP 706 – Filter for tarfile.extractall (pythonGH-102953) (python#104548)

Backport of c8c3956

* [3.8] pythongh-99889: Fix directory traversal security flaw in uu.decode() (pythonGH-104096) (python#104332)

(cherry picked from commit 0aeda29)

Co-authored-by: Sam Carroll <70000253+samcarroll42@users.noreply.github.com>

* [3.8] pythongh-104049: do not expose on-disk location from SimpleHTTPRequestHandler (pythonGH-104067) (python#104121)

Do not expose the local server's on-disk location from `SimpleHTTPRequestHandler` when generating a directory index. (unnecessary information disclosure)

(cherry picked from commit c7c3a60)

Co-authored-by: Ethan Furman <ethan@stoneleaf.us>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

* [3.8] pythongh-103935: Use `io.open_code()` when executing code in trace and profile modules (pythonGH-103947) (python#103954)

Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com>

* [3.8] pythongh-68966: fix versionchanged in docs (pythonGH-105299)

* [3.8] Update GitHub CI workflow for macOS. (pythonGH-105302)

* [3.8] pythongh-105184: document that marshal functions can fail and need to be checked with PyErr_Occurred (pythonGH-105185) (python#105222)

(cherry picked from commit ee26ca1)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>

* [3.8] pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) (pythonGH-104575) (pythonGH-104592) (python#104593) (python#104895)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

I simplified the docs by eliding the state of the world explanatory
paragraph in this security release only backport.  (people will see
that in the mainline /3/ docs)

(cherry picked from commit d7f8a5f)
(cherry picked from commit 2f630e1)
(cherry picked from commit 610cc0a)
(cherry picked from commit f48a96a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

* [3.8] pythongh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (pythonGH-105174) (pythonGH-105200) (pythonGH-105205) (python#105370)

Upgrade builds to OpenSSL 1.1.1u.

Also updates _ssl_data_111.h from OpenSSL 1.1.1u, _ssl_data_300.h from 3.0.9.

Manual edits to the _ssl_data_300.h file prevent it from removing any
existing definitions in case those exist in some peoples builds and were
important (avoiding regressions during backporting).

(cherry picked from commit ede89af)
(cherry picked from commit e15de14)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Ned Deily <nad@python.org>

* Python 3.8.17

* Post 3.8.17

* Updated CI to build 3.8.17

---------

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Benjamin Peterson <benjamin@python.org>
Co-authored-by: Ned Deily <nad@python.org>
Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: HARSHA VARDHAN <75431678+Thunder-007@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Jeremy Paige <ucodery@gmail.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Steve Dower <steve.dower@python.org>
Co-authored-by: Owain Davies <116417456+OTheDev@users.noreply.github.com>
Co-authored-by: Éric <earaujo@caravan.coop>
Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Dong-hee Na <donghee.na@python.org>
Co-authored-by: Blind4Basics <32236948+Blind4Basics@users.noreply.github.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Sam Carroll <70000253+samcarroll42@users.noreply.github.com>
Co-authored-by: Ethan Furman <ethan@stoneleaf.us>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-authored-by: stratakis <cstratak@redhat.com>
Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
MeggyCal added a commit to MeggyCal/bleach that referenced this issue Jun 29, 2023
snarfed added a commit to snarfed/webutil that referenced this issue Aug 15, 2023
…rlsplit

https://docs.python.org/release/3.9.17/whatsnew/changelog.html#changelog

> gh-102153: urllib.parse.urlsplit() now strips leading C0 control and space characters following the specification for URLs defined by WHATWG in response to CVE-2023-24329. Patch by Illia Volochii.

python/cpython#102153
hroncok pushed a commit to fedora-python/cpython that referenced this issue Oct 6, 2023
00399 #

* pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported to Python 2 from Python 3.12.

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
Co-authored-by: Lumir Balhar <lbalhar@redhat.com>
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Nov 7, 2023
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.

(cherry picked from commit 87c5c9fa629317090ce65ec4c686596a2c4cd148)

GitOrigin-RevId: 5b41ed8209d965402c7f593afb85c1e13afeb23a
hroncok pushed a commit to fedora-python/cpython that referenced this issue Nov 28, 2023
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Feb 22, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Feb 27, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
hroncok pushed a commit to fedora-python/cpython that referenced this issue Mar 7, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 11, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 11, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 25, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
hroncok pushed a commit to fedora-python/cpython that referenced this issue Mar 26, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Apr 2, 2024
pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

Backported from Python 3.12

(cherry picked from commit f48a96a)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue May 15, 2024
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.

GitOrigin-RevId: 87c5c9fa629317090ce65ec4c686596a2c4cd148
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Sep 19, 2024
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.

GitOrigin-RevId: 87c5c9fa629317090ce65ec4c686596a2c4cd148
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Nov 8, 2024
The latest release of Python 3.8 and 3.9 have been just released
that contain the fix to a security vulnerability backported to
those versions:

python/cpython#102153

Release notes:
* https://www.python.org/downloads/release/python-3817/
* https://www.python.org/downloads/release/python-3917/

The fix improved sanitizing of the URLs and until Python 3.10 and
3.11 get released, we need to add the sanitization ourselves to
pass tests on all versions.

In order to improve security of airflow users and make the tests
work regardless whether the users have latest Python versions
released, we add extra sanitisation step to the URL to apply
the standard WHATWG specification.

GitOrigin-RevId: 87c5c9fa629317090ce65ec4c686596a2c4cd148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests