-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multipart/form-data in the request body #2321
Support for multipart/form-data in the request body #2321
Conversation
… this test will be easily modified now once the correct behavior is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR!
The main issue we see with this PR is the test harness tests don't actually send multipart form data as expected.
Each of those tests needs to use the -F
flag instead of --data
in the curl command.
packages/http-server/src/__tests__/body-params-validation.spec.ts
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,28 @@ | |||
====test==== | |||
If I have an operation with a parameter with the allowEmptyValue set to false to false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have an operation with a parameter with the allowEmptyValue set to false to false | |
If I have an operation with a parameter with the allowEmptyValue set to false |
@@ -0,0 +1,28 @@ | |||
====test==== | |||
If I have an operation with a parameter with the allowEmptyValue set to false to false | |||
and I send a request with a correcly encoded parameter but with an empty key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I send a request with a correcly encoded parameter but with an empty key | |
and I send a request with a correctly encoded parameter but with an empty key |
@@ -0,0 +1,30 @@ | |||
====test==== | |||
If I have an operation with a parameter with the allowEmptyValue set to false to false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have an operation with a parameter with the allowEmptyValue set to false to false | |
If I have an operation with a parameter with the allowEmptyValue set to true |
@@ -0,0 +1,30 @@ | |||
====test==== | |||
If I have an operation with a parameter with the allowEmptyValue set to false to false | |||
and I send a request with a correcly encoded parameter and with the parameter empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I send a request with a correcly encoded parameter and with the parameter empty | |
and I send a request with a correctly encoded parameter and with the parameter empty |
====server==== | ||
mock -p 4010 ${document} | ||
====command==== | ||
curl -i -X POST http://localhost:4010/path -H "Content-Type: multipart/form-data" --data "reserved=:/?#[]@!$&'()*+,;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of these curl requests you need to use -F
instead of --data
as that will actually pass form data as multipart.
@chohmann All requested changes are now implemented. Thank you for pointing out that I needed to pass data with the -F flag rather than the --data flag! I initially misunderstood that setting "Content-type: multipart/form-data" was not sufficient. And apologies about the typos in tests I added to the test harness -- I copied these tests from existing files in the application/x-www-form-urlencoded test suite and didn't proofread the original files. I have now fixed all the typos in these as well as in the original files. I also added some regex parsing logic for multipart form data to handle additional cases. Please let me know if you'd like any further changes! Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really appreciate your update, but I think there are some cases the parseMultipartFormDataParams()
function doesn't yet handle regarding the boundary between multipart entities.
@@ -54,6 +54,15 @@ export function splitUriParams(target: string) { | |||
}, {}); | |||
} | |||
|
|||
export function parseMultipartFormDataParams(target: string) { | |||
const pairs = target.split(/--.*\s*.*="/).filter(element => element); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code assumes that the boundary between multipart entities is --
followed by any string. However, the boundary between multipart entities is defined, in each individual request, by the request's Content-Type
header's Boundary
parameter. This is because the client must choose a value that doesn't occur within any multipart entity that it will transmit. For example, consider the effect of the following example on your implementation.
curl -v -F a=b -F c='--="' <hostname>
Also consider any form value that includes \r\n\r\n
.
RFC 2046 § 5.1.1 describes the boundary as...
a line consisting entirely of two hyphen characters ("-", decimal value 45) followed by the boundary parameter value from the Content-Type header field, optional linear whitespace, and a terminating CRLF.
I think you'll need to either:
- implement this as the standard describes, with unit tests (not
test-harness
integration tests) to validate that it works with a variety of input, OR - rely on a library which parses multipart data according to the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ed, thank you so much for your feedback. I discovered a library called parse-multiform-data which I'm installing as an npm package that seemed to be the most reliable method. I made changes according to this, such as parsing the boundary parameter from the Content-Type header field in order to pass it to the library for parsing the body. I left some comments on the recent changes that hopefully help clarify some of the decisions I made when implementing this. Thanks again and please let me know if you'd like any other changes!
packages/http/src/validator/index.ts
Outdated
// parse boundary string from content-type in case media type is multipart/form-data | ||
const multipart = require('parse-multipart-data'); | ||
const contentTypeHeader = headers.get('content-type'); | ||
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we parse the boundary parameter value from the Content-Type header field as specified by https://datatracker.ietf.org/doc/html/rfc2046#section-5.1.1. We do so using the parse-multipart-data library to ensure it's parsed correctly according to the standard. The boundary string is later passed into the parse-multipart-data library to correctly parse the body in the function parseMultipartFormDataParams.
packages/http/src/mocker/index.ts
Outdated
|
||
// parse boundary string from content-type in case media type is multipart/form-data | ||
const multipart = require('parse-multipart-data'); | ||
const multipartBoundary = multipart.getBoundary(contentTypeHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in packages/http/src/validator/index.ts
packages/http/src/validator/index.ts
Outdated
const multipart = require('parse-multipart-data'); | ||
const contentTypeHeader = headers.get('content-type'); | ||
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : ""; | ||
const mediaType = contentTypeHeader ? contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), "") : contentTypeHeader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After end-end and unit testing, the content-type header appeared in both formats: "multipart/form-data;boundary=--...." and "multipart/form-data; boundary=--...." (with the space after the semicolon). Thus, I'm handling possible whitespace between the media type and the boundary string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I'm concerned about your use of a regular expression here. Notice that the boundary may contain any 7-bit ASCII character, which I think includes many characters with special meanings inside a regular expression (e.g.,
.
,*
,(
). Consider using a library to parse the Content-Type header in its entirety. Here's a very simple example using whatwg-mimetype:import MIMEType from 'whatwg-mimetype'; const t = new MIMEType('multipart/form-data; boundary=.*.?.+zzz'); console.log(t?.essence); console.log(t.parameters.get('boundary'));
-
It looks like this block of code is a duplicate of what you added in
mocker/index.ts
. Could it be factored out?
…g are parsed from content-type header
packages/http/src/mocker/index.ts
Outdated
// parse boundary string from content-type in case media type is multipart/form-data | ||
const multipart = require('parse-multipart-data'); | ||
const multipartBoundary = multipart.getBoundary(contentTypeHeader); | ||
const mediaType = contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in packages/http/src/validator/index.ts
const multipart = require('parse-multipart-data'); | ||
const bufferBody = Buffer.from(target, "utf-8"); | ||
|
||
// the parse-multipart-data package requires that the body is passed in as a buffer, not a string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: hovissimo's comment on nachomazzara/parse-multipart-data#14 (I was getting the same error as him)
And also the source code: https://github.com/nachomazzara/parse-multipart-data/blob/master/src/multipart.ts#L34C80-L34C80
const parts = multipart.parse(bufferBody, multipartBoundary); | ||
|
||
return parts.reduce((result: Dictionary<string>, pair: string) => { | ||
result[pair['name']] = pair['data'].toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then convert back from Buffer to String for further processing by Prism
@@ -54,6 +55,30 @@ export function splitUriParams(target: string) { | |||
}, {}); | |||
} | |||
|
|||
export function parseMultipartFormDataParams(target: string, multipartBoundary?: string) { | |||
if(!multipartBoundary) { | |||
const error = "Boundary parameter for multipart/form-data is not defined or generated in the request header. Try removing manually defined content-type from your request header if it exists."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boundary string MUST be non-empty if this is multipart/form-data (which it is if we're in this function). Thus, we return with error. According to https://bobbyhadz.com/blog/error-multipart-boundary-not-found-in-express-js, postmanlabs/postman-app-support#191, https://stackoverflow.com/questions/40351429/formdata-how-to-get-or-set-boundary-in-multipart-form-data-angular, this can occur when the user is manually setting the content-type, which means the server is not generating it (and the boundary string). Thus, the possible fix is suggested in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Returning an error seemed to me a better solution than just returning an empty dictionary as it keeps the user more informed. Thus, the return type is Either).
packages/http/src/validator/index.ts
Outdated
@@ -75,17 +76,22 @@ const validateInputBody = ( | |||
checkRequiredBodyIsProvided(requestBody, body), | |||
E.map(b => [...b, caseless(headers || {})] as const), | |||
E.chain(([requestBody, body, headers]) => { | |||
// parse boundary string from content-type in case media type is multipart/form-data | |||
const multipart = require('parse-multipart-data'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd prefer to use import
statements rather than dynamic require
whenever possible.
packages/http/src/validator/index.ts
Outdated
const multipart = require('parse-multipart-data'); | ||
const contentTypeHeader = headers.get('content-type'); | ||
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : ""; | ||
const mediaType = contentTypeHeader ? contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), "") : contentTypeHeader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I'm concerned about your use of a regular expression here. Notice that the boundary may contain any 7-bit ASCII character, which I think includes many characters with special meanings inside a regular expression (e.g.,
.
,*
,(
). Consider using a library to parse the Content-Type header in its entirety. Here's a very simple example using whatwg-mimetype:import MIMEType from 'whatwg-mimetype'; const t = new MIMEType('multipart/form-data; boundary=.*.?.+zzz'); console.log(t?.essence); console.log(t.parameters.get('boundary'));
-
It looks like this block of code is a duplicate of what you added in
mocker/index.ts
. Could it be factored out?
…empty boundary string in multipart data, modularize MIME header parsing
@@ -47,11 +49,35 @@ export function deserializeFormBody( | |||
} | |||
|
|||
export function splitUriParams(target: string) { | |||
return target.split('&').reduce((result: Dictionary<string>, pair: string) => { | |||
return E.right(target.split('&').reduce((result: Dictionary<string>, pair: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping in Either monad in order to match the return type of parseMultipartFormDataParams, which is of type Either as it may return an error.
|
||
return pipe( | ||
validateAgainstReservedCharacters(encodedUriParams, encodings, prefix), | ||
content.mediaType === "multipart/form-data" ? parseMultipartFormDataParams(target, multipartBoundary) : splitUriParams(target), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since parseMultipartFormDataParams returns an error wrapped in E.left if the boundary string is empty, the return type is Either, and I wrapped the result of splitUriParams in E.Right to match the return type (Either) of parseMultipartFormDataParams. Thus, we move this statement into the expression pipeline.
|
||
const requestBody = request.body as string; | ||
const encodedUriParams = pipe( | ||
mediaType === "multipart/form-data" ? parseMultipartFormDataParams(requestBody, multipartBoundary) : splitUriParams(requestBody), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseMultipartFormDataParams returns an error wrapped in E.Left if the boundary sting is empty. Here I convert that to a default value (i.e. empty dictionary) if that occurs. I'm not sure if I'm supposed to be doing anything else here -- this same behavior (unwrapping the monad using defaults) is implemented for findContentByMediaTypeOrFirst just below on line 169, so I attempted to emulate that.
Hi Ed, thanks for your feedback, I am now doing all Content-Type header parsing using the whatwg-mimetype library you recommended, and the dynamic require statements were changed to imports. As for the duplicate code block, I moved it to a small utility function called parseMIMEHeader I created in validators/headers.ts. Finally, I also attempted to improve the error reporting for empty multipart boundary strings by wrapping parseMultipartFormDataParams in the Either monad and returning a descriptive error (which meant I also needed to wrap splitUriParams in the Either monad to maintain equivalent types). |
…ub.com/ilanashapiro/prism into feature/multipart-form-data-validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution and for working through our reviews with us!
This is included in version 5.2.0 of Prism. |
Hooray, thanks Ilana, Ed, and Chelsea! |
Addresses #2008
Summary
Currently, validation always fails whenever a multipart/form-data content type is used in the request body. This prevents the normal usage of prism without any other proxy server (like nginx) on which such requests can be filtered out. Now, users can successfully use multipart/form-data content type in the request body.
Checklist
Screenshots
Given the following file called examples/testapi.yaml:
Writing a non-empty post request to http://127.0.0.1/path with included form-data in the request body now produces the following, successfully returning the response defined in the example:
![image](https://private-user-images.githubusercontent.com/28958079/252963186-fdba4800-0446-49c9-b74f-0f7243f256bb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0OTIwMDcsIm5iZiI6MTczOTQ5MTcwNywicGF0aCI6Ii8yODk1ODA3OS8yNTI5NjMxODYtZmRiYTQ4MDAtMDQ0Ni00OWM5LWI3NGYtMGY3MjQzZjI1NmJiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDAwMDgyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYyM2UyZWNkZjViZjFhZWE4OTY5Y2RjN2ViNTg3Mzg2YjNhYjAzOWYxYTI2NjZkNjk2MTg1YTExNmIwMzQ2ZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.2RFMk2vEimNWpFGOfE-aMc9CoMbO5iCF9KCHAfO10HE)
![image](https://private-user-images.githubusercontent.com/28958079/255588817-8f650b74-3b86-4d18-8997-da6b3862b2c1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0OTIwMDcsIm5iZiI6MTczOTQ5MTcwNywicGF0aCI6Ii8yODk1ODA3OS8yNTU1ODg4MTctOGY2NTBiNzQtM2I4Ni00ZDE4LTg5OTctZGE2YjM4NjJiMmMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDAwMDgyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY1YWVlZTMzNjFjNTM2NWY0YTgyNmVjOWFlNTQ0NjJhODc3YmUwZjJhYjUwZjM2ZDFiYTMzYWQ1ZjU4YjczNDEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.QY2CUzyhkw03YA50CiwwkUuTizmqCcsJUeWPWEefXK4)
In contrast, running this same command on the old version of Prism gives a 422 error:
![image](https://private-user-images.githubusercontent.com/28958079/252964217-70088729-f0cb-48ed-aeae-9b77f9bb5528.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0OTIwMDcsIm5iZiI6MTczOTQ5MTcwNywicGF0aCI6Ii8yODk1ODA3OS8yNTI5NjQyMTctNzAwODg3MjktZjBjYi00OGVkLWFlYWUtOWI3N2Y5YmI1NTI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDAwMDgyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQxNzljYjhhYzQ0NTVkMzRiZmQzMzdiZTQxMTdmMGU3MDk5N2FjZjhjN2NkNzA5NGFiYjAyZGQ3YTQ1ZWU0YWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1ZpehTUKhiQbuLU5CGQJKqg1Li0gCwAKgaDIDx7wnz4)
![image](https://private-user-images.githubusercontent.com/28958079/255589001-2afaef08-5dfe-4bdc-8f30-2bcc5e545288.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0OTIwMDcsIm5iZiI6MTczOTQ5MTcwNywicGF0aCI6Ii8yODk1ODA3OS8yNTU1ODkwMDEtMmFmYWVmMDgtNWRmZS00YmRjLThmMzAtMmJjYzVlNTQ1Mjg4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDAwMDgyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU2MTE3ZDJhMjQ1ZGRiN2FlZWE2NWYzYmE2Y2M0NDAzZDg5ODExMDNkOTNmZjI0YjQ2ODY5MDc4ZDBiOGM4MjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.H0SbYHNGciAPPMR1jCzkUZXZJh2EkttYdJajh1cW9JE)