-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement multipart parsing #974
Comments
I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature. |
Hey, it seems like node-fetch is going to have multipart parser soon, if this PR gets merged. Maybe this would be helpful for you somehow? |
I think it seems useful, i mean https://www.npmjs.com/package/form-data has 4.5M download / week, + it's a spec'ed thing I have long been wanting to have a formData method in node-fetch |
Hello. Telegram bot api accept only FormData for files |
Correct me if I'm wrong, but isn't the bigger use case the server parsing multipart form data from the client? The example from the node-fetch PR iml is: import { Response } from 'node-fetch'
http.createServer(async function (req, res) {
const formData = await new Response(req, { headers: req.headers }).formData()
const file = formData.get('avatar')
var arrayBuffer = await file.arrayBuffer()
var text = await file.text()
var whatwgReadableStream = file.stream()
}); It would be great if undici had the same support for multipart form data. |
the example should actually be something like this: http.createServer(async function (req, res) {
const url = `${req.protocol}://${req.get('host')}${req.originalUrl}`
const request new Request(url, { body: req, method: req.method, headers: req.headers })
const formdata = request.formData()
const file = formData.get('avatar')
var arrayBuffer = await file.arrayBuffer()
var text = await file.text()
var whatwgReadableStream = file.stream()
}); But it's easier/quicker to construct a Response that don't need any url or extra options
It's not about responding with a multipart/form-data response from a server like in: I have long been wishing for a But again for a server to respond with a multipart/form-data isn't all that bad. It's a grate thing to use for sending multiple files back to the client now that fetch also has // server.js
http.createServer(async function (req, res) {
const formData = new FormData()
fd.set('/uploads/logo.png', new File(...))
fd.set('/uploads/movie-introduction.mp4', new File(...))
const {body} = new Response(formData)
stream.Readable.from(body).pipe(res)
})
// client.js
const res = await fetch('/get-files')
const fd = await res.formData()
const files = [...fd]
// use the files for something This is much easier than bringing in a zip for packing and unpacking on both the server and on the client if you for any reason needs to get multipe files from the server. I have done this once before myself. But this is unherd of cuz never once have we had I also added it cuz it's also part of the spec which might be the biggest reason to impl it so folk get what they expect |
@jimmywarting: I wouldn't mind adding support for it. Would you be ok if someone port |
Sure, Be my guest take what u want. The source is based on a quite old version of formidable that i patched quite a bit in 2017 (to get rid of some core nodejs parts like also take a look into busboy if their new parser is any faster. i haven't tested it against node-fetch new parser I think the owner of busboy said that it dose some more extra stuff that don't co-here with the FormData parser spec like supporting non-UTF8 encodings also have a other question... would this formData() parser make it into stream consumers as well somehow? import {formData} from 'node:stream/consumers';
const fd = await formData(iterable, {boundary: req.boundary})
fd // is instancesof FormData Think it would also be cool to have something more low level like import {formData, blob} from 'node:stream/consumers'
// returns another asyncIterable (maybe the stream consumer name `formData` is the wrong name for it)
const asyncIterable = formData(req, {boundary: req.boundary})
for await (const [name, entry, meta] of asyncIterable) {
if (typeof entry === 'string') {
// it's normal regular string, should this also be streamable, what if the payload is really large?
} else {
// it's either a File class or something that is streamable/iterable as well.
const blob = await blob(entry, { type: meta.contentType || meta.type }) // wish for type option to exist here (it's missing)
new File([blob], meta.fileName, { type: blob.type })
}
} another solution could be to treat every entry the same and the meta only changes depending on the type of file vs normal field: import {multipart, blob, text} from 'node:stream/consumers'
// returns another asyncIterable
const asyncIterable = multipart(req, {boundary: req.boundary})
for await (const [entry, meta] of asyncIterable) {
const headers = meta.headers // Headers is a instance of fetch `Headers` class
const type = headers.get('content-type')
const name = meta.fieldName
// parses the filename and the utf8 filename variant as well out of content-disposition header
const fileName = meta.fileName // undefined || string
const item = await (fileName ? blob(entry) : string(entry))
} think i like this☝️ solution more then the previous 2 examples |
We'd like to have multipart support for Shopify Hydrogen, where our API route handlers are just functions that take undici |
I have a working parser that just needs tests written before a PR (no code used from other parsers which should help with license issues), although tests keep stalling tap (same issue as const server = createServer(async (req, res) => {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const resp = new Response(Buffer.concat(chunks), { headers: req.headers })
const form = await resp.formData()
const same = form.get('c')
const hope = readFileSync('./index.html', 'utf-8')
assert(same.name, 'index.html')
assert(hope === await same.text())
assert(form.get('a') === 'b')
res.end('lol')
server.close()
}).listen(0)
const form = new FormData()
form.set('a', 'b')
form.set('c', new Blob([readFileSync('./index.html')]), 'index.html')
// send POST req with form body here... I'm just worried as it's less than 100 lines that there may be edge cases I've missed, although I've hopefully made it fool-proof 😅. |
It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators |
Another thing i have been thinking about doing in node-fetch when parsing FormData and encountering very large files would somehow be to offload the data into some temporary file instead of allocating everything into memory... that might be something that's next on my agenda...
|
I added the ability to use async iterators which each yield an object close to {
headers: [Object: null prototype] {
'Content-Disposition': 'form-data',
name: 'c',
filename: 'index.html',
'Content-Type': 'application/octet-stream'
},
content: '... file contents...'
} although I loop over the entire buffer body before parsing so I need to combine that logic 🤔 |
Buffering the full body is not going to work well for Node.js unfortunately, it can end up using quite a lot of memory. |
I agree with @mcollina |
I think I can do that, although using formidable/busboy/dicer, etc, is probably a better choice in this situation. The fetch spec does require parsing formdata responses though |
I'll add SvelteKit to the list of frameworks that would definitely benefit from this — our best workaround is to monkey-patch the undici Obviously this is sub-optimal since it doesn't permit streaming parsing, and it adds bloat to apps, but it'll do for the time being! |
Essentially we need to support I'm ok to implement this here, even if it would have some major caveats for big files. |
Would it be OK to write files to a this feature would kindof be blocked by Getting Files backed up by the filesystem |
Is it safe to assume availability of tmp directory in cloud environments? Immutable containers are not widely used? |
I would prefer to not implement that in here but rather provide some hook that some external module can plug in to implement that behavior. There are likely so many different ways this can be done that I do not want to support them all in this module. One note: you want the files to be removed anyway on process end, take a look at https://github.com/mcollina/on-exit-leak-free as it allows you to have a finalizer that is run on process exit.
Indeed, that should be implemented first. |
This comment was marked as off-topic.
This comment was marked as off-topic.
What we need is somebody with time to implement this. Can somebody from Shopify, Cloudflare, Vercel (just to name a few of the companies that +1 here) help? |
Workaround for handling multipart/form-data slicing off top and bottom boundaries. - [] Need to prevent download in existing projects - [] Duplicate the logic for JS files multipart/formdata not supported currently in Undici nodejs/undici#974
Workaround for handling multipart/form-data slicing off top and bottom boundaries. - [x] Need to prevent download in existing projects - [x] Duplicate the logic for JS files - [] tests multipart/formdata not supported currently in Undici nodejs/undici#974
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script. Resolves #1624 Discussion: #1623 notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries. multipart/formdata parsing not supported currently in Undici nodejs/undici#974
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script. Resolves #1624 Discussion: #1623 notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries. multipart/formdata parsing not supported currently in Undici nodejs/undici#974
Added the --from-dash <worker-name>, getting the source code into an initialized project replacing the template script. Resolves #1624 Discussion: #1623 notes: Workaround for handling multipart/form-data slicing off top and bottom boundaries. multipart/formdata parsing not supported currently in Undici nodejs/undici#974
Added `wrangler init --from-dash <worker-name>`, which allows initializing a wrangler project from a pre-existing worker in the dashboard. Resolves #1624 Discussion: #1623 Notes: `multiplart/form-data` parsing is [not currently supported in Undici](nodejs/undici#974), so a temporary workaround to slice off top and bottom boundaries is in place.
…1645) Added `wrangler init --from-dash <worker-name>`, which allows initializing a wrangler project from a pre-existing worker in the dashboard. Resolves #1624 Discussion: #1623 Notes: `multiplart/form-data` parsing is [not currently supported in Undici](nodejs/undici#974), so a temporary workaround to slice off top and bottom boundaries is in place.
We will be picking this up if no one else has already, either myself or @cameron-robey |
I have a WIP implementation of a FormData parser at https://github.com/KhafraDev/undici/tree/formdata-parser It's surprisingly small for its capabilities (basic parsing, even with mismatched chunks) and should be easy enough to maintain. It has an API that is a mix between formidable (writing chunks to the parser directly) and like busboy (using an EventEmitter). Currently it buffers the body in one big chunk, but similarly to busboy, will be moved to its own EventEmitter. For fetch's use case, this doesn't matter as the I did write it to be used outside of Usage currently works along the lines of import { FormDataParser, FormData, request } from 'undici'
import { createServer } from 'http'
import { once } from 'events'
const server = createServer(async (req, res) => {
const parser = new FormDataParser(req.headers['content-type'])
parser.on('header', (header) => {
/* do something with header */
})
parser.on('body', (chunk) => {
/* do something with body */
})
for await (const chunk of req) {
parser.write(chunk)
}
res.end('got formdata')
}).listen(0)
await once(server, 'listening')
const fd = new FormData()
fd.set('key', 'and value')
const blob = new Blob([/* some data */])
fd.set('file', blob, 'readme.md')
await request(`http://localhost:${server.address().port}`, {
method: 'POST',
body: fd
}) tl;dr I made a small async FormData parser that needs a lot more work. Also busboy is a better solution for now regardless of my opinion of adding dependencies for a niche feature. |
I think i would prefer busboy, it's a very fast multipart parser that scans using streamsearch (Boyer-Moore-Horspool) that is very efficient (using bytes instead of string search). plus it also handles different kinds of encodings (utf8 and 7bit) |
This is irrelevant in regards to Also note that the experimental/WIP FormData parser I worked on does use bytes to compare buffers (albeit in the form of However, as I recommended in the comment before, using busboy for now is fine: |
Can you clarify this statement? Is it impossible to implement it efficiently, or doing so would require significantly more effort? |
It is impossible to implement it efficiently - at the time of writing this - for the reasons outlined here. Essentially, even if you parse it asynchronously & chunk the body data, you will still end up with Without blobs being backed up by the filesystem, memory utilization is quite poor (see nodejs/node#37340 (comment)). I know @jimmywarting has made some effort to have this implemented in node core, but unfortunately no one has tackled it just yet. On the server, it doesn't seem to matter. There are better ways of parsing FormDatas in node and Deno. |
so now that
|
I would recommend using busboy or @fastify/busboy directly, which suits nodejs more than .formData (which I have a strong disdain for) |
i think i would have done it more like this: createServer(async (request, response) => {
const resp = new Response(request, { headers: request.headers })
const fd = await resp.formData()
const file = fd.get('a-file')
await fs.promises.writeFile(file.name, file.stream())
response.end('done')
}).listen(3000)
const chunks: Buffer[] = [];
for await (const chunk of request) chunks.push(chunk);
const resp = new Response(Buffer.concat(chunks), { and likewise there is no need to read the hole file data to an arrayBuffer() const fileArrayBuffer = await file.arrayBuffer();
await writeFile(file.name, Buffer.from(fileArrayBuffer)); i think that undici / fetch / formData, have the potential to parse the multipart upload and all the files into a temporary location and write it them to the disc then it could populate the // kind of doing this:
fd = new FormData()
blob = await fs.openAsBlob(path, { type })
fd.append('a-file', blob, filename)
// later resolve `res.formData()` with the FormData
return fd this was something that i had in the pipeline for But i haven't gotten around at implementing this yet. this would match more closely to how chrome operates when using |
I disagree. It's an awful solution that should have never landed in undici. I regret not blocking the PR that implemented it until we could find common ground.
For large files, a streaming implementation doesn't matter whatsoever. It's all in memory anyways, as you can't opt-in for node to save the blobs FormData creates as file-based blobs. For small files, a streaming implementation doesn't make sense for obvious reasons.
Which is easier said than done. How do we decide which blobs use the file-based solution? Is it even faster? Do we leave it up to the users, who would have to install undici anyways (and rather could have just installed @fastify/busboy), to configure themselves? How do we handle permissions, filesystem errors, cleaning up files, etc.?
Chrome's .formData parser is also synchronous, as it should be according to the spec, as it is in every other browser and server environment, so comparing undici/node to any other implementation is not a good comparison. I don't see a reason why we should follow chrome with regards to how it offloads files while we are completely ignoring the spec. |
I'd also like to mention that unlike busboy or @fastify/busboy, there is no way to limit the size of a file, nor the number of files or fields, nor file name length, or any other limit that these packages have. It's a giant security risk to use it |
Permissions🤷♂️ Filesystem errorscould fallback to in-memory based blobs Cleaning up filesthe process of cleaning up files can be managed using finalizers, as exemplified in this reference, losing references to the blob could delete the file. Decide which blobs use the file-based solutionFor determining the appropriate solution for blobs, particularly in the context of For
Well, at least you could examining the complete Alternative SolutionsThere could also be a completely different approach to this as well. It involves a separate handling mechanism that absolves the developer from grappling with these complexities. If NodeJS featured a "Blob Storage" concept, it could autonomously decide whether to page So you would just create many blobs chunks and the "core" would take care of automatically deciding if it should offload some data to the disc or not. and it could happen without you even knowing that it have happened. Google provides relevant documentation on this approach here and here. In this alternative scenario, the primary responsibility would be generating blob chunks, while the core system would manage disk paging automatically. An example implementation is shown below: async blob() {
const chunks = []
for await (const uint8_chunk of getStream(this)) {
// creating many small blobs "could" automatically start paging to Disk
const small_chunk = new Blob([ uint8_chunk ])
chunks.push(small_chunk)
}
// this final blob would just hold many references to where each chunk is on the disc.
// so this would not hold much data or any memory, this would be fine to have in memory without problems
return new Blob(chunks)
} This approach presents advantages by globally handling the intricacies and simplifying memory and disk usage concerns for everyone by not having to think about this concerns
if the parser is being synchronous or asynchronous in accordance with specifications isn't the main concern. The paramount consideration is that the parser functions as anticipated by the specifications. If different approaches yield the same outcome, the manner in which it's achieved becomes irrelevant |
First off I'd like to mention that the solutions offered either do not work in this context or are more complicated than using one of the aforementioned libraries. Chrome also works in an entirely different environment than node, where there is not a better solution other than this overly complicated approach. Node has better alternatives, even if we were to implement everything here (which I am strongly against due to how complex it is).
There is no guarantee that those finalizers are called. If the process crashes or is killed externally, is there any guarantee that they're called? If not, do we just ignore those temporary files and let them build up? We're still facing issues with WeakRefs and finalizers in other parts of fetch.
This unfortunately isn't even close to a viable alternative.
It is the main concern. The spec, for everything else, is the foremost and important thing we regard. I see no reason that it should be different for .formData, but it is.
It doesn't. There are 4 WPTs that do not pass for .formData, this is an absurdly high amount of failures considering the number of tests (14) and compared to other mixin methods, which are spec-compliant. |
I know, but this is all that you got to go on... it isn't much but at least it's something. it could work for a simple check such as not being able to upload more than 4 GiB...
I haven't tested every corner cases yet, I have so far is a simple process.once('exit', () => {
tempDir && rmdirSync(tempDir, { recursive: true })
}) to clean up temp files on exit. There could be an alternative solution to this.
So if the process suddenly exit or is killed then the file will be deleted afterwards. finalizers could still be used to close
I'm curious what this 4 are, mind sharing any links? |
Our
extractBody
method is missing support for parsingmultipart/form-data
responses.The text was updated successfully, but these errors were encountered: