Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with MultiPartParser class #1059

Closed
2 tasks done
Tracked by #1888
aretasg opened this issue Sep 21, 2020 · 14 comments
Closed
2 tasks done
Tracked by #1888

Issue with MultiPartParser class #1059

aretasg opened this issue Sep 21, 2020 · 14 comments
Labels

Comments

@aretasg
Copy link

aretasg commented Sep 21, 2020

Checklist

  • The bug is reproducible against the latest release and/or master.
  • There are no similar issues or pull requests to fix it yet.

Describe the bug

MultiPartParser class fails to parse a request body produced by the client but not Postman (or similar).

To reproduce

from starlette.formparsers import MultiPartParser, FormParser
from starlette.datastructures import Headers
import asyncio


client_body = b'--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
client_headers = Headers({'accept-encoding': 'gzip,deflate', 'content-length': '39940', 'content-type': 'multipart/form-data; boundary=xoxoxoxoxoxoxoxoxoxo_1600710720987', 'host': '10.0.5.13:80', 'connection': 'Keep-Alive', 'user-agent': 'KNIME Uploader'})

postman_body = b'----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'
postman_headers = Headers({'content-type': 'multipart/form-data; boundary=--------------------------850116600781883365617864', 'user-agent': 'PostmanRuntime/7.26.0', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '10.0.5.13:80', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '2455'})


async def stream(message):

    body = message.get("body", b"")
    if body:
        yield body

async def client_test_multipart():

    parser = MultiPartParser(client_headers, stream({"body": client_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

async def postman_test_multipart():

    parser = MultiPartParser(postman_headers, stream({"body": postman_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

loop = asyncio.get_event_loop()
loop.run_until_complete(client_test_multipart())
loop.run_until_complete(postman_test_multipart())

Expected behavior

client_test_multipart should read and return two files from the client_body as with postman_test_multipart and postman_body.

Actual behavior

client_test_multipart reads only one file from the request body produced by the client.

Debugging material

[('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1a20>)]
[('attributes', <starlette.datastructures.UploadFile object at 0x7f091c0b1a58>), ('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1828>)]

Environment

  • OS: Linux
  • Python version: 3.8
  • Starlette version: 0.13.6
@aretasg
Copy link
Author

aretasg commented Sep 21, 2020

I have found the issue.

In the client_body, after the first file (Content-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n) there is a \n character instead of \r\n.

As per HTTP convention, \r\n should be used. However, this is a standard and not a rule and it would be nice if the MultiPartParser would parse both types of line breaks (as some other frameworks do) or at least throw an appropriate exception instead of an ambiguous 422 when using UploadFile.

In the meantime this is what I have managed to write to allow the both files to be parsed from the request:

@app.post("/upload")
async def upload(request: Request):

    body = await request.body()
    request._body = body.replace(b'\r\n', b'\n').replace(b'\n', b'\r\n')

    inp = await request.form()
    fasta = inp["fasta"]
    attributes = inp["attributes"]

@tomchristie
Copy link
Member

Okay, so a useful point to start with would be double checking how other frameworks handle the malformed input here.
Eg. what does Flask do? (I think you could try posting the request to https://httpbin.org/post to find out?)

What client are you using?

@aretasg
Copy link
Author

aretasg commented Sep 22, 2020

Hi @tomchristie ! Thank you for replying.

So before Starlette the service was using Flask and there were no issues - so Flask can handle the 'malformed' input.

I have also tried issuing a POST with the 'malformed' body to https://httpbin.org/post using this:

import http.client
import mimetypes


conn = http.client.HTTPSConnection("httpbin.org")

client_boundary = 'xoxoxoxoxoxoxoxoxoxo_1600710720987'
postman_boundary = '--------------------------850116600781883365617864'

client_payload = '--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
postman_payload = '----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'

client_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(client_boundary)
}
postman_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(postman_boundary)
}

conn.request("POST", "/post", client_payload, client_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

conn.request("POST", "/post", postman_payload, postman_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

It returned both files and does recognise non-standard line breaks in the body.

The client is a KNIME pipeline and throughout execution sends HTTP calls using Palladian nodes. I have no access to the client code so I had to look for ways to overcome the issue using Starlette instead.

@aretasg
Copy link
Author

aretasg commented Sep 23, 2020

I would be interested to make a PR on this but can't decide whether it should be here. Knowing that Starlette uses python-multipart library for parsing the request perhaps a PR there would be more appropriate?

@hall-b
Copy link

hall-b commented Jun 2, 2022

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

@aretasg
Copy link
Author

aretasg commented Jun 5, 2022

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

I still find it strange that after all this time since the issue has been raised the maintainers of python-multipart do not to even comment on the issue.

@hall-b python-multipart is only used for uploading multiple files at once. Perhaps you can do it one file at a time? If not, do try modifying the body of the request to make it standard as I did here.

@h0rn3t

This comment was marked as spam.

@artyom-ace

This comment was marked as spam.

@h0rn3t

This comment was marked as spam.

@h0rn3t

This comment was marked as spam.

@Kludex Kludex added this to the Version 1.0 milestone Oct 22, 2022
@Kludex Kludex mentioned this issue Oct 22, 2022
11 tasks
@Kludex
Copy link
Member

Kludex commented Jan 31, 2023

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

@Kludex Kludex modified the milestones: Version 1.0, Version 0.25.0 Feb 4, 2023
@Kludex Kludex modified the milestones: Version 0.25.0, Version 0.26.0 Feb 13, 2023
@Kludex
Copy link
Member

Kludex commented Feb 23, 2023

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

This decision was postponed.

@Kludex Kludex added the good first issue Good for beginners label Feb 27, 2023
@Kludex
Copy link
Member

Kludex commented Feb 27, 2023

PR welcome on python-multipart to solve this. Please refer to Kludex/python-multipart#31.

@Kludex Kludex closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2023
@martin-kokos
Copy link

I've come across this issue when writing a test for endpoint that accepts a file. Because I am in control of the payload in this case, this is a client side workaround that worked for me - using requests to format the request body:

import requests

with open(file_path, "rb") as f:
    body, content_type = requests.PreparedRequest()._encode_files(files={"file": f}, data={})
response = test_client.post("/document/new", content=body, headers={"Content-Type": content_type})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants