Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form data support #185

Closed
wants to merge 10 commits into from

Conversation

Michae1Weiss
Copy link
Contributor

@Michae1Weiss Michae1Weiss commented Dec 13, 2021

Hi everyone!

This PR shows an example of support for uploading files via Swagger.

Usage example with werkzeug.datastructures.FileStorage (Flask):

from werkzeug.datastructures import FileStorage
from pydantic import BaseModel
from spectree import models

class File(BaseModel):
    uid: str
    file: models.BaseFile

@api.validate(form_data=File)
def with_file():
    file_dict = request.context.form_data.file.dict()
    file = FileStorage(**file_dict)
    return jsonify()
...

Issues:

  1. If uploading files using the Falcon framework plugin - content_length is always 0.

Questions:

  1. How to get content_length from falcon.cyutil.reader.BufferedReader object?
  2. Initially SpecTree.validate only accepts json parameter to define request body. Would it be correct to separate request body schemas for multipart/form-data and application/json content types with different parameters (json and from_data) like I did or is there a cleaner way to do it?

    For example, the 'drf-yasg' module uses a generic request_body parameter for both content types.
    To define the content type 'drf-yasg' parses the view class object and if a view include only form parsers - swagger will support file upload.
    drf-yasg implementation

    Wouldn't it make sense to make a generic request_body parameter like in 'drf-yasg' and understand expected content type of the request body by some flag for example?

Comment on lines 3 to 4
import falcon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never been used?

spectree/spec.py Outdated
Comment on lines 193 to 201
query = func.__annotations__.get("query", query)
nonlocal json
json = func.__annotations__.get("json", json)
nonlocal form_data
form_data = func.__annotations__.get("form_data", form_data)
nonlocal headers
headers = func.__annotations__.get("headers", headers)
nonlocal cookies
cookies = func.__annotations__.get("cookies", cookies)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can combine all the nonlocal in one line.

spectree/spec.py Outdated
Comment on lines 147 to 150
if json and form_data:
# TODO: how to resolve conflict between json and form_data?
# TODO: Maybe replace 'form_data' with 'request_body' as it is done in 'drf-yasg'?
raise ValueError("You should provide either 'json' or 'form_data'.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to raise exceptions here. Explicit is better than implicit.

I agree that combining these two parameters should be better. But I'm not sure if we should make some breaking changes now.

Comment on lines 201 to 202
# TODO: add support for other media types?
# TODO: OR handle unsupported media types?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refs: https://www.iana.org/assignments/media-types/media-types.xhtml

How about the following?

  • font
  • video
  • message
  • model

I wonder what's the major difference among these types for file handlers?

Comment on lines 195 to 199
# "content_length": len(part.data), # FIXME: raises error if file is too big
# sadly, BufferedReader don't have 'tell' method
"content_length": 0, # TODO: replace placeholder with something that will work
"content_type": part.content_type,
"stream": part.stream.read(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already consume the stream buffer, why not just use part.data to access the data and content length?

I'm not sure if it's possible to keep the buffer and let the user consume it later.

@@ -182,12 +182,32 @@ def request_validation(self, req, query, json, headers, cookies):
media = None
if json:
req.context.json = json.parse_obj(media)
elif form_data and media is not None:
data = {}
for part in media: # TODO: add support for falcon 2.0 (media is always None for 'multipart/form-data')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, async falcon should also be in the TODO list.

request.context = Context(
query.parse_obj(request.query_params) if query else None,
json.parse_raw(await request.body() or "{}") if json else None,
form_data.parse_obj(data or "{}") if form_data and data else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
form_data.parse_obj(data or "{}") if form_data and data else None,
form_data.parse_obj(data or {}) if form_data and data else None,

@kemingy
Copy link
Member

kemingy commented Dec 14, 2021

Thanks for your great work!

You can fix the lint by running make format and test with make lint.

@Michae1Weiss Michae1Weiss marked this pull request as draft December 15, 2021 07:35
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Dec 20, 2021

This pull request introduces 1 alert when merging bad2b28 into 0f24b49 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

@Michae1Weiss
Copy link
Contributor Author

@kemingy Thanks for review! I’ve added form-data support for async falcon and implemented various optimizations and fixes based on your comments. It's now time to write some tests for it :)
I will try to write them next week.
I'd be grateful for a second review and feedback. I'm not quite sure about some of the fixes (still need to learn more about asgi and cgi)

@kemingy
Copy link
Member

kemingy commented Dec 21, 2021

Hi, @Michae1Weiss. Thanks for your contribution. It may take me some time to review the code. I will try to give you some feedback by the end of this week.

@@ -38,9 +39,27 @@ def register_route(self, app):
)

async def request_validation(self, request, query, json, form_data, headers, cookies):
data = {}
if request.headers.get('content-type', '').split(";")[0] in self.FORM_MIMETYPE:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if get the default '' and split here index[0] will raise an error

Copy link
Contributor Author

@Michae1Weiss Michae1Weiss Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!
You're right, splitting an empty string with a None separator returns [] and ''.split()[0] will raise an error.
But in this case split() uses a separator ;. So ''.split(';')[0] will return an empty string
From Python documentation:
Splitting an empty string with a specified separator returns [''].
Refs: https://docs.python.org/3/library/stdtypes.html#str.split

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got your point, you are right, cool~

form_dict.update(**request.files.to_dict())
if form:
req_form = form_dict
elif json:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere in this PR there is a condition that will raise exception if both json and form exist, so it seems like they are mutually exclusive

Copy link
Member

@kemingy kemingy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great work. Feel free to leave some comments. Especially for the cgi.FileStorage part. I'm not very familiar with it. I'm not sure if I understand your code correctly.

@@ -13,6 +13,7 @@

class StarlettePlugin(BasePlugin):
ASYNC = True
FORM_MIMETYPE = ("application/x-www-form-urlencoded", "multipart/form-data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

req_headers = request.headers or {}
req_cookies = request.cookies or {}

if request.mimetype in self.FORM_MIMETYPE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that the user should provide form instead of json here?

req.env.setdefault("QUERY_STRING", "")

try:
_form = cgi.FieldStorage(fp=req.stream, environ=req.env) # TODO: rename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNSAFE behavior: req.env. This may cause the leak of the machine information.


try:
_form = cgi.FieldStorage(fp=req.stream, environ=req.env) # TODO: rename
except ValueError: # Invalid boundary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will falcon.Request.bounded_stream be better to avoid this exception?

encoding, filename = encoded.split("''")
field.filename = filename
field.file = BytesIO(field.file.read().encode(encoding))
if getattr(field, "filename", False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getattr(field, "filename", False) will always return None since the type of field is always cgi.FieldStorage.

return [self.parse_field(subfield) for subfield in field]

# When file name isn't ascii FieldStorage will not consider it.
encoded = field.disposition_options.get("filename*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is filename* a regex?

# When file name isn't ascii FieldStorage will not consider it.
encoded = field.disposition_options.get("filename*")
if encoded:
encoding, filename = encoded.split("''")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any example for this one? I'm not sure when will the field.filename contains ''.

req_headers = request.headers or {}
req_cookies = request.cookies or {}

if request.mimetype in self.FORM_MIMETYPE:
body = form or json or None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is or None for?

@@ -1,6 +1,9 @@
import cgi
Copy link
Collaborator

@yedpodtrzitko yedpodtrzitko May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cgi is deprecated since Python 3.11

@kemingy
Copy link
Member

kemingy commented Aug 4, 2022

@kemingy kemingy closed this Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants