-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing is extremely slow if there are a lot of CR LF characters in the stream #67
Comments
At first glance and investigating a little bit ... looks like this line is being called multiple times by your example code. https://github.com/andrew-d/python-multipart/blob/master/multipart/multipart.py#L1327 Adding a small if boundary[index] == c:
# If we found a match for our boundary, we send the
# existing data.
if index == 0:
data_callback('part_data')
# The current character matches, so continue!
index += 1
elif c in (CR, LF):
index += 1
else:
index = 0 Still, in some cases, you still would have lots of calls...
|
This may justify a CVE, as it can be used to block CPU resources and deny service for other users with a manipulated file upload. Even accidentally, when uploading a real file with just |
While bench-marking different multipart implementations, I stumbled across a second (likely related) problem: Throughput goes down drastically if a file upload consists of characters that are also present in the boundary. Here are the results for a 1MB upload with different content, and a boundary of
|
Noticed a slowness in the code when trying to parse objects with a lot of '\r\n' characters in a row.
It seems that the code does too many callbacks to
on_part_data
, causing potentially many operations, which are not at all necessary (see example and output below).py version: python 3.9.2
python-multipart version: 0.0.6
Here's a demonstration
and here's the output that I get:
In our use case this can lead to some really slow upload times. For example, a 9MB file filled with CR LF can take 2000s+, and in applications based on FastAPI or Starlette (web frameworks), this may lead to blocking the main thread for seconds and up to a minute at a time (depending on size of chunks in the request stream).
This is an edge case, so to say, but because of it, we are forced to use a different formdata parser.
(If any python webserver uses this to parse incoming data, one can quite easily DoS the service)
The text was updated successfully, but these errors were encountered: