Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an invalid UTF-8 sequence in the URL make the server to crash. #2725

Closed
wants to merge 1 commit into from

Conversation

Natim
Copy link

@Natim Natim commented Jul 26, 2016

I've got a few website done using Pyramid, Cornice and lately Kinto (the two last ones helps to make great HTTP API's on top of Pyramid)

One of them has got pentested using OpenVAD and we found an error that can make any Pyramid website to crash.

How to reproduce?

Add the string '%82%AC' in the URL of your Pyramid project and see it fails.

@Natim Natim force-pushed the 2725-invaid-unicode-crashes branch from c474841 to 1463435 Compare July 26, 2016 12:11
@Natim Natim changed the title How to make any pyramid website to crash? Adding an invalid UTF-8 sequence in the URL make the server to crash. Jul 26, 2016
@chrisrossi
Copy link
Member

On Tue, Jul 26, 2016 at 8:02 AM, Rémy HUBSCHER notifications@github.com
wrote:

%82%AC

True. WebOb tries to read the path_info as UTF-8 and it results in a
UnicodeDecodeError.

It's not clear to me how this leads to a DDoS attack, though. I wouldn't
think these requests would be more expensive to process than valid
requests. On the contrary, these should be cheaper, since the framework
bails out almost immediately and never hits any business logic.

I'm open to the idea that I'm missing something, but on the face of it,
this seems to be true, but not interesting.

Chris

@Natim
Copy link
Author

Natim commented Jul 26, 2016

I wouldn't think these requests would be more expensive to process than valid requests.

Ok, in any case having a 500 is not something we want right?

@mmerickel
Copy link
Member

As @chrisrossi said, this is not a crash nor an exploit. This is a 500 error which you can catch in your application via an exception view or any WSGI middleware just like any other error in your application.

@view_config(context=UnicodeDecodeError, renderer='nope.jinja2')
def bad_char_view(request):
    request.response.status = 500
    return {}

@mmerickel mmerickel closed this Jul 26, 2016
@mmerickel
Copy link
Member

For reference here is a traceback while using the starter scaffold.

2016-07-26 09:28:29,583 ERROR [waitress:341][waitress] Exception when serving /
Traceback (most recent call last):
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/pyramid/router.py", line 236, in __call__
    response = self.invoke_subrequest(request, use_tweens=True)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/pyramid/router.py", line 211, in invoke_subrequest
    response = handle_request(request)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/pyramid_debugtoolbar/toolbar.py", line 201, in toolbar_tween
    default_active_panels)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/pyramid_debugtoolbar/toolbar.py", line 66, in __init__
    panel_inst = panel_class(request)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/pyramid_debugtoolbar/panels/request_vars.py", line 110, in __init__
    'get': [(k, request.GET.getall(k)) for k in request.GET],
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/webob/request.py", line 834, in GET
    vars = GetDict(data, env)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/webob/multidict.py", line 287, in __init__
    MultiDict.__init__(self, data)
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/webob/multidict.py", line 38, in __init__
    items = list(args[0])
  File "/Users/michael/work/oss/pyramid/foo/env/lib/python3.5/site-packages/webob/compat.py", line 113, in parse_qsl_text
    yield (name.decode(encoding), value.decode(encoding))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte

@Natim
Copy link
Author

Natim commented Jul 26, 2016

Does this means that you wouldn't consider it to be something to fix in webob?

@mmerickel
Copy link
Member

This would be considered a duplicate of Pylons/webob#115 and Pylons/webob#161 which will still end up raising an exception. There is nothing more for webob to do here as it does not know the charset of the request if it's not utf-8. It is possible to catch the exception and try other encodings, but this is not something we would do automatically.

@tseaver
Copy link
Member

tseaver commented Jul 26, 2016

We could choose to make it a "400 Bad Request" error response, rather than a 500: it is an error on the part of the client.

@mmerickel
Copy link
Member

Yes, unfortunately we cannot add a default error view for this in Pyramid until the issues are resolved in webob. We cannot assume that any UnicodeDecodeError is a result of a bad request, but we could with a URLDecodeError.

@mmerickel
Copy link
Member

Pyramid defines its own URLDecodeError but I'm not sure if we could use it to hook our usage of webob. It might be possible if we overrode request.GET, request.POST and request.__init__ to catch UnicodeDecodeError and translate it. No one has attempted to do this afaik.

See #2047 as well.

@Natim
Copy link
Author

Natim commented Jul 26, 2016

Pyramid defines its own URLDecodeError but I'm not sure if we could use it to hook our usage of webob

Let me try that then.

@olemoign
Copy link

@mmerickel Regarding the exception view you described a few messages ago, this sadly does not work, the exceptions are not caught by Pyramid and rendered as wanted.
I guess the exception views are not yet hooked while parsing the request ?

@digitalresistor
Copy link
Member

@olemoign Do you have the debugtoolbar enabled? According to ptweens on a project of mine:

Implicit Tween Chain

Position    Name                                                             
--------    ----                                                             
-           INGRESS                                                          
0           pyramid_debugtoolbar.toolbar_tween_factory                       
1           pyramid.tweens.excview_tween_factory                             
2           pyramid_tm.tm_tween_factory                                      
-           MAIN                                                  

This would mean when the debugtoolbar causes the exception by accessing .GET (the stacktrace that @mmerickel posted above shows this too) the excview tween hasn't even executed, and it is what allows exception views to work.

I'd be interested in seeing a stacktrace from an app that doesn't have debugtoolbar, and where you can't use an exception view to change the error displayed.

@digitalresistor
Copy link
Member

WSGI middleware could catch this though. I am working on fixes in WebOb...

@olemoign
Copy link

@bertjwregeer You are perfectly right, this is caused by the debugtoolbar.

2016-08-19 14:30:50,437 ERROR [waitress][waitress] Exception when serving /¿
Traceback (most recent call last):
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/pyramid_debugtoolbar/toolbar.py", line 173, in toolbar_tween
    p = request.path_info
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/webob/descriptors.py", line 68, in fget
    return req.encget(key, encattr=encattr)
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/webob/request.py", line 166, in encget
    return bytes_(val, 'latin-1').decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/Users/olemoign/.pyenv/versions/3.5.1/envs/rta/lib/python3.5/site-packages/paste/translogger.py", line 69, in __call__
    return self.application(environ, replacement_start_response)
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/pyramid/router.py", line 236, in __call__
    response = self.invoke_subrequest(request, use_tweens=True)
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/pyramid/router.py", line 211, in invoke_subrequest
    response = handle_request(request)
  File "/Users/olemoign/.pyenv/versions/rta/lib/python3.5/site-packages/pyramid_debugtoolbar/toolbar.py", line 175, in toolbar_tween
    raise URLDecodeError(e.encoding, e.object, e.start, e.end, e.reason)
pyramid.exceptions.URLDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1: invalid start byte

When disabled, Pyramid catches the exception as it should.

@ztane
Copy link
Contributor

ztane commented Aug 20, 2016

Unfortunately this is not the first time that debugtoolbar messes up; unfortunately that's more like a rule, not an exception :D

@Natim
Copy link
Author

Natim commented Aug 22, 2016

It is not only about debugtoolbar, you will have the same problem with any other plugin trying to read request.GET or request.path.

@digitalresistor
Copy link
Member

@Natim That's not true unless they install themselves as a tween BEFORE the execview. Otherwise you can capture the UnicodeDecodeError as an exception view without issues.

@Natim
Copy link
Author

Natim commented Aug 22, 2016

unless they install themselves as a tween BEFORE the execview.

How can I check if it is the case?

Here is the code: https://github.com/Kinto/kinto/blob/eadd6e23f44ef0b4584823db49347e7114f8b0eb/kinto/core/initialization.py#L334-L341

@digitalresistor
Copy link
Member

That's a NewRequest subscriber. If that correctly fires, then you are already in the router, past Tween ingress

You can list tweens from the command line by using:

ptweens config/file/path.ini

This will list the Tweens and the order in which they are executed.

@Natim
Copy link
Author

Natim commented May 18, 2017

Another way of fixing it that might be interesting: https://github.com/mozilla-services/mozservices/pull/40/files

@jenstroeger
Copy link
Contributor

Thanks for the helpful discussion. I’t been a while since the latest update and just yesterday this arrived in our log file:

2019-08-16 15:00:13,207 ERROR [srv.views.exceptions][MainThread] Internal server error detected!

Traceback (most recent call last):
  File "/…/pyramid/urldispatch.py", line 88, in __call__
    path = decode_path_info(environ['PATH_INFO'] or '/')
  File "/…/pyramid/compat.py", line 276, in decode_path_info
    return path.encode('latin-1').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 15: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/…/pyramid/tweens.py", line 41, in excview_tween
    response = handler(request)
  File "/…/pyramid/router.py", line 77, in handle_request
    info = routes_mapper(request)
  File "/…/pyramid/urldispatch.py", line 93, in __call__
    e.encoding, e.object, e.start, e.end, e.reason
pyramid.exceptions.URLDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 15: invalid start byte

@Natim’s initial suggestion to add a %82%AC worked; other invalid sequences cause the same exception of course.

I’m curious though: is catching a UnicodeDecodeError exception (as suggested in this comment by @mmerickel above) a strong enough indicator that the request contained corrupted data, i.e. is BadRequest the appropriate server response for this kind of exception?

@jenstroeger
Copy link
Contributor

One more thought. In order to log/inspect the faulty URL, request.url won’t work because it attempts to turn the URL string into a Unicode string. Instead, would it make sense to have a request.asciiurl property or some such which returns an ASCII version of the URL, e.g.

>>> request.asciiurl
'http://localhost:6543/api/ping/\xc1\xc1'

@digitalresistor
Copy link
Member

You can pull the PATH_INFO directly from the request.environ dictionary to be able to log the value that was received without parsing.

@jenstroeger
Copy link
Contributor

Thanks @bertjwregeer! I noticed something odd here, though. When I send

> http get http://localhost:6543/api/ping/%c1%c1

and log as you said

_log.info("Failed to decode url: %s", f"{env['REQUEST_METHOD']} {env['SERVER_NAME']}:{env['SERVER_PORT']}{env['PATH_INFO']} {env['SERVER_PROTOCOL']}")

Then the server logs the following:

Failed to decode url: GET 0.0.0.0:6543/api/ping/ÁÁ HTTP/1.1

which, upon closer inspection is:

> echo '/api/ping/ÁÁ' | hexdump -C
00000000  2f 61 70 69 2f 70 69 6e  67 2f c3 81 c3 81 0a     |/api/ping/.....|
0000000f

Notice how the initial /%c1%c1 are now c3 81 c3 81. However, this one works: 'RAW_URI': '/api/ping/%C1%C1'.

Question: the other dictionary keys (SERVER_NAME, SERVER_PORT, …) will always be available here, or should I better use get() here?

@digitalresistor
Copy link
Member

Ah, yeah, PATH_INFO gets manipulated by your WSGI server. It is no longer percent encoded. This is unfortunately a part of the WSGI spec.

That being said, RAW_URI is not likely to exist for all WSGI servers (for example, waitress doesn't set that in the WSGI environment), and is non-standard.

You can see the values that NEED to exist in the WSGI environ documented in PEP3333: https://www.python.org/dev/peps/pep-3333/#environ-variables

@digitalresistor
Copy link
Member

Sorry, I misspoke, PATH_INFO manipulation is not due to the WSGI standard, but the CGI specification that it inherits from. PATH_INFO is automatically url decoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants