-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CVE-2015-2104] Urlparse insufficient validation leads to open redirect #67693
Comments
The module urlparse lacks proper validation of the input leading to open redirect vulnerability. The issue is that URLs do not survive the round-trip through This can be practically exploited this way : http://example.com/login?next=/////evil.com The for fix this would be for |
Perhaps you actually meant four input slashes, producing two output slashes. That seems more of a bug to me: >>> urlparse("////foo.com")
ParseResult(scheme='', netloc='', path='//foo.com', params='', query='', fragment='')
>>> urlunparse(_)
'//foo.com' Solving bpo-22852, which proposes some flags including “has_netloc” on the ParseResult object, might help with this. |
Yes! forgive my mistake I meant four slashes. |
For your information, this security issue has been assigned a CVE ID : CVE-2015-2104 |
Can you please elaborate on the "exploit" part? In Firefox, the "////etc/passwd" link shows me my local file /etc/passwd. Ok, but how is it an issue? "//etc/passwd" also shows me file:////etc/passwd. The OWASP article on Open Redirect shows example to redirect to a different website. Can you should an example how redirect to a website and not a file:// URL? |
Yes, exploiting this bug an attacker may redirect a specific vitim to a malicious website, in our case evil.com
///evil.com will be parsed as relative-path URL which is the correct expected behaviour
As you see two slashes are removed and it is marked as a relative-path URL but when we reconstruct the URL using urlunparse() function, the URL is treated as an absolute URL to which you will be redirected. >>> x = urlunparse(urlparse("////evil.com"))
>>> urlparse(x)
ParseResult(scheme='', netloc='evil.com', path='', params='', query='', fragment='') |
>>> urlparse("//evil.com")
ParseResult(scheme='', netloc='evil.com', path='', params='', query='', fragment='') I see evil.com in the netloc field, ok. But Firefox doesn't use Python to parse and url, and typing //evil.com in the address bar converts the address to file:////evil.com. Not a website, but a local file. So I don't understand the redirection part. Could you maybe write a vulnerable CGI script to demonstrate the bug? I wrote the following HTML file to try to understand the bug, but I was only able to show the content of my local file /etc/issue: <head> |
When you directly type //evil.com or ////evil.com in Firefox URL bar you will be redirect to evil.com and that is very known, read this : http://homakov.blogspot.com/2014/01/evolution-of-open-redirect-vulnerability.html Here is a video demonstration of the vulnerability : http://youtu.be/l0uDAqpRPpo |
Do you think it would be enough to ensure the urlparse() result remembers whether the empty “//” was present or not? In other words, something like the following mockup (based on the bpo-22852 proposal). An example vunerable program would help me understand this as well. >>> urlparse("////evil.com")
ParseResult(scheme="", netloc="", has_netloc=True, path="//evil.com", ...)
>>> urlunparse(_)
"////evil.com" Or would we still need special handling of a path that starts with a double slash despite that; either URL-encoding the second slash, or maybe just raising an exception? Consider that the components are already supposed to be URL-encoded, and you can still generate unexpected valid URLs by giving other invalid components, such as >>> urlunparse(("", "netloc/with/path", "/more/path", "", "", ""))
'//netloc/with/path/more/path' |
I am not quiet sure about the first proposal but I strongly believe the appropriate method to fix this is by checking if the path starts with double slashes and then URL encoding the two leading slashes. |
While some websites may use urlunparse(urlparse(url)) to validate a url, this is far from standard practice. Django, for instance, does not use this method. While I agree we should clean this behavior up, this is not a vulnerability in core python, and we need to invalidate the assigned CVE. |
"Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component." https://docs.python.org/2/library/urlparse.html 2015-03-03 22:16 GMT+00:00 Paul McMillan <>:
(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunparse) %40mcmillan.ws |
From: cve-assign () mitre org -----BEGIN PGP SIGNED MESSAGE----- We think that the issue reduces to the question of whether it's https://docs.python.org/2/library/urlparse.html says: urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
Parse a URL into six components, returning a 6-tuple. This
corresponds to the general structure of a URL:
scheme://netloc/path;parameters?query#fragment.
urlparse.urlunparse(parts)
Construct a URL from a tuple as returned by urlparse(). The parts
argument can be any six-item iterable. This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had unnecessary delimiters (for example, a ? with an
empty query; the RFC states that these are equivalent). The first issue is that the urlunparse documentation is ambiguous. We So, our expectation is that urlunparse(urlparse(original_url)) should The actual behavior is: >>> from urlparse import urlparse, urlunparse
>>> print urlparse("////example.com")
ParseResult(scheme='', netloc='', path='//example.com', params='', query='', fragment='')
>>> print urlparse(urlunparse(urlparse("////example.com")))
ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='')
>>> print urlparse(urlunparse(urlparse(urlunparse(urlparse("////example.com")))))
ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='') Here, urlparse(urlunparse(original_url)) does have a significant The next question is, if there is a CVE for a report of a
Again, this is imperfect. It works best in the relatively common case CVE assignment team, MITRE CVE Numbering Authority |
From: Amos Jeffries <squid3 () treenet co nz> On 6/03/2015 10:42 a.m., cve-assign () mitre org wrote:
urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
Parse a URL into six components, returning a 6-tuple. This
corresponds to the general structure of a URL:
scheme://netloc/path;parameters?query#fragment. My 2c ... no it does not. There are 7 parts in a URL. What is called "netloc" in that description The userinfo field is very much alive and well in non-HTTP schemes. Ignoring the userinfo field leaves implementations open to attacks of AYJ |
Any updates concerning this issue ? is it going to be fixed or at least modify the documentation in order to warn developers about this behaviour ? |
FYI I posted a patch at bpo-22852 to retain the empty netloc “//” when appropriate. But even if there is interest in that patch, I guess it can still only be applied to the next version of Python (3.5 or whatever), being a new feature. Maybe you could suggest some wording or a patch to the documentation that could be applied to bugfix releases as well. |
As Martin said, backporting a change for this wouldn't be appropriate |
What's the verdict on this bug? It's been dangling for almost one and half year. |
It is not clear what Yassine’s bug is. Maybe it is about round-tripping from urlparse() → urlunparse(). If so, it could be solved by fixing either of the following two problems:
When considering the second problem of validation, you have to be aware that urlunparse() is documented to handle schemes like “mailto:” not listed in “uses_netloc”. According to RFC 6068, mailto://evil.com is valid syntax, and is decoded to netloc="", path="//evil.com". In this case, netloc="evil.com" would probably be invalid instead. |
The problem is in
where group 3 is the //netloc, and 4 is netloc substring, of an url http://netloc/foobar - and this is what Python should use to parse an URI using RFC 2396... but: >>> pat.fullmatch('////netloc').group(4)
''
>>> pat.fullmatch('/relative').group(4)
>>> Someone took the shortcut. no netloc is different from netloc being the empty string '', but >>> urlparse('////netloc')
ParseResult(scheme='', netloc='', path='//netloc', params='', query='', fragment='')
>>> urlparse('/netloc')
ParseResult(scheme='', netloc='', path='/netloc', params='', query='', fragment='') In the latter parsing result netloc should be *None*. Unfortunately I believe it is too late to change this either way. |
*I mean the problem exists in |
bpo-34276 was opened about a similar case for “file:” URLs. I believe both “file:” scheme and no-scheme cases are a regression and could be fixed by adding another pair of slashes (an empty “netloc” part): >>> urlparse("////foo.com") # No change
ParseResult(scheme='', netloc='', path='//foo.com', params='', query='', fragment='')
>>> urlunparse(_) # Knows to escape slashes with another double-slash
'////foo.com' |
#113563 fixes this issue. If the path starts with It is interesting, that f3963b1 was almost correct fix, but it used wrong boolean operator, so it broke a common case of |
…ng with multiple slashes and no authority (GH-113563)
…starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…h path starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…h path starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
… path starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…h path starting with multiple slashes and no authority (pythonGH-113563) (cherry picked from commit e237b25) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…starting with multiple slashes and no authority (pythonGH-113563)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: