handle blobs as fields #53

Uzlopak · 2021-12-01T16:17:27Z

I think this handles the issue with blob uploads correctly:

It doesnt matter we have set a filename or not. We just check if the content is an octet stream. If so we treat it as a file. If not, we treat it as a field.

Also according to the Spec, a filename is not mandatory.
https://datatracker.ietf.org/doc/html/rfc7578#section-4.2

   For form data that represents the content of a file, a name for the
   file SHOULD be supplied as well, by using a "filename" parameter of
   the Content-Disposition header field.  The file name isn't mandatory
   for cases where the file name isn't available or is meaningless or
   private; this might result, for example, when selection or drag-and-
   drop is used or when the form data content is streamed directly from
   a device.

So the check if filename was not undefined was already non spec conform.

You could also create a Blob with mimetype application/octet-stream and upload it, and because we would maybe check for filename,e.g. 'blob' we would not handle the blob correctly. Blob only means, that we have the data in-memory of the browser. For the server it should be actually not be important if an upload was made with a blob or not. It should only look if it is octet-stream or not.

Checklist

run npm run test and npm run benchmark
tests and/or benchmarks are included
documentation is changed or added
commit message and code follows the Developer's Certification of Origin
and the Code of conduct

lib/types/multipart.js

climba03003 · 2021-12-01T17:28:53Z

It is the problem is multipart/form-data spec, as it didn't state clearly how to distinguish a field and file.
It is rare but it can

a file without filename
a field without content-type

I would say it is better to make those decision customizable by the consumer.
e.g. multiple flags, or function to determine what they means to be a file.
It is important because we do not know what they needs and the requirement of their server.

I am happy if both option exist.
the flags pass to the default function to determine the file and field.
If they want more, then pass an function to modify it.

Uzlopak · 2021-12-01T18:17:46Z

According to the spec:

https://datatracker.ietf.org/doc/html/rfc7578#section-4.4

   Each part MAY have an (optional) "Content-Type" header field, which
   defaults to "text/plain".  If the contents of a file are to be sent,
   the file data SHOULD be labeled with an appropriate media type, if
   known, or "application/octet-stream".

So it seams, that we must handle all content types which are not text/plain as files?!

cyantree · 2021-12-01T20:01:53Z

Thanks for your research. It's given me more understanding of the matter. If I understand it correctly the spec doesn't allow separating between files and fields - it's all just parts and everything else is up to the developer to decide.

So I think:

This package shouldn't make any assumptions regarding which part is a file or a field.
It's the developers responsibility and ability to choose.
Because for a given route it's clear which parts should be handled in which way, it should be possible to specify this.

Therefore I propose the following:

By default this package handles every part as a filestream
It's possible to specify per route how every part is handled

That's implemented like this:

Remove the config option onFile and introduce a route specific option like onPart() or a hook
Remove the separation between files and fields.
onPart() will be called for every part and lets the user transform the part according to his wishes
This package delivers various utility functions for quickly configuring this

A rough idea:

onPart(part) {
    if (part.name === 'config') {
        return secureJSON((await inlinePart(part, {maxSize: 1024 * 1024 * 1024}).value);
    }

    if (part.name === 'certificate') {
        return inlinePart(part, {maxSize: 10 * 1024 * 1024});
    }

    return dumpPart(part, {dir: 'uploads', maxSize: 1024 * 1024 * 1024});
}

That's a bit like my workaround from the original issue fastify/fastify-multipart#292 (comment)

Or it might be done even more high-level (kind of a webpack loader chain (but run from left to right)).

preParsing: configureParts({
    config: [inlinePart({maxSize: 1024 * 1024 * 1024}), secureJSON],
    certificate: [inlinePart({maxSize: 10 * 1024 * 1024})],
}, [dumpPart({dir: 'uploads', maxSize: 1024 * 1024 * 1024})]);

I think that this would be a huge paradigm shift for this package.

kibertoad · 2021-12-01T20:10:29Z

@cyantree See how busboy is currently used: https://github.com/fastify/fastify-multipart/blob/master/index.js

From my understanding, it is created once per request and then discarded. Multer does same thing: https://github.com/expressjs/multer/blob/master/lib/make-middleware.js

So I think that having a config option on Busboy instance, which would be responsible for deciding what is field and what is file is a valid approach. It should fall upon higher-level library (such as multer or fastify-multipart) to pass correct configuration per-route.

Uzlopak · 2021-12-01T20:17:27Z

Actually if no mimetype is set it should be text/plain.

https://datatracker.ietf.org/doc/html/rfc7578#section-4.6

How about a simple regex or function for content type?
Default is: everything is a file except text/plain and no content type means text/plain?

That is imho closes to the rfc

cyantree · 2021-12-01T20:18:16Z

@kibertoad You're absolutely right, sorry for mixing up these two packages. :)
Yes, busboy should only provide handling of parts and fastify-multipart should do the connection and configuration for specific routes.

kibertoad · 2021-12-01T20:19:46Z

@Uzlopak What if I send a text/plain file as a file? How should that be handled?

cyantree · 2021-12-01T20:25:42Z

@Uzlopak
I've created a simple HTML form with file upload, text input and textarea input.
The following data will be sent when uploading a .txt file:

------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="x"; filename="test.txt"
Content-Type: text/plain

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="txt"

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="txt2"

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY--

What I take from this:

File upload and text fields differ in filename and Content-Type.
As discussed earlier filename shouldn't be seen as safe marker to identifying file uploads.
Content-Type is also no safe marker as the original issue regards parsing JSON parsing of fields. Therefore in this case it's not possible to know whether a specific part is a file upload (e. g. for storing on the server) or a metadata object (e. g. for further validation)
In general Content-Type only tells what a content it but now what it's purpose is.

Uzlopak · 2021-12-01T20:26:04Z

Then it is actually non conform to the spec imho.

Ok, different approach:

We supply per config a function which gets the values of content type, fieldname and filename. Depending on this everybody can determine if it is a file or a field.

In higher levels like fastify you have already schemas. So you could reuse them to configure busboy accordingly

cyantree · 2021-12-01T20:31:57Z

We supply per config a function which gets the values of content type, fieldname and filename. Depending on this everybody can determine if it is a file or a field.

In higher levels like fastify you have already schemas. So you could reuse them to configure busboy accordingly

Yes, that would be fine for me. And when not supplying the function everything gets handled as a file.

Btw: Maybe let's check how other languages do it. I know that PHP also handles files seperately. So they must also use some aspect to mark something as a file, presumable Content-Type and/or filename.

Uzlopak · 2021-12-01T20:38:35Z

Why should everything be by default a file? I kind of prefer the other way round

cyantree · 2021-12-01T20:39:01Z

I've made some tests with PHP:
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}), '');
results in error 4 in PHP (UPLOAD_ERR_NO_FILE). So in this case the whole field gets dropped.

body.append('json', new Blob(['{"foo":"bar"}']));
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}));
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}), 'test.json');

all these three result in file uploads.

So I couldn't manage adding a blob as a field.

kibertoad · 2021-12-01T20:43:49Z

Why should everything be by default a file? I kind of prefer the other way round

I think this is how busboy used to work, and what users are going to expect. Experiment done by @cyantree seems to confirm that this is actually a norm.

cyantree · 2021-12-01T20:49:59Z

Why should everything be by default a file? I kind of prefer the other way round

Because without any configuration every field would be loaded into memory which could easily lead to memory and performance issues.

The whole thing is quite confusing and I'm asking myself if this gets out of hand.
Sorry for swaying back and forth, I'm only trying to get a grip on the whole topic.

So maybe let's take a step back and start over:

Finding a sensible default for separating fields from files should be managable and maybe the status quo (before our PRs), already is it. If other languages have a solid way in handling parts, why should this library need some customization options for this?

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart? Maybe there the solution is configuring which field (without Content-Type application/json) should be handled as json.

kibertoad · 2021-12-01T20:53:54Z

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart?

Does busboy currently expose enough configuration for that to be feasible, though?

Uzlopak · 2021-12-01T21:01:12Z

I pushed my proposal. You could supply a isPartAFile as option to busboy. I chose the old behaviour of busboy as fallback if no isPartAFile is supplied as option.

Now everybody can write his own function based on fieldName, contentType and fileName if it is a file or a field.

cyantree · 2021-12-01T21:05:01Z

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart?

Does busboy currently expose enough configuration for that to be feasible, though?

Yes, I think so. busboy emits the following events:

boy.emit('file', fieldname, file, filename, encoding, contype)
boy.emit('field', fieldname, buffer, false, truncated, encoding, contype)

They contain pretty much everything.

fastify-multipart already contains a onFile() config that will be called for every file part. So a onField() config might be used for parsing the json data safely when necessary.

cyantree · 2021-12-01T21:09:26Z

I pushed my proposal. You could supply a isPartAFile as option to busboy. I chose the old behaviour of busboy as fallback if no isPartAFile is supplied as option.

Now everybody can write his own function based on fieldName, contentType and fileName if it is a file or a field.

Thanks. Yes, that should do the trick, if exposed by fastify-multipart. Then one could write a filter function specifying which parts should be treated as field and therefore might be parsed as json and handled by the integrated validation.

Uzlopak · 2021-12-01T21:10:07Z

@cyantree
Currently busboy decides what a field is and what a file. So fastify-multipart can not hook into it and say, "hey busboy, you are wrong, this is actually a file/field". But with a isPartAFile you could tell busboy what actually a file is and what a field is and then consume the right event for each part.

@kibertoad
I think with isPartAFile, we would have a necessary minimal change, which gives the power to the userland to configure busboy accordingly and with the default fallback it would be non-breaking. So our busboy would be still a valid drop-in-replacement for legacy busboy.

If you agree with me, I would add the necessary changes to typings and documentation.

Maybe isPartAFile is not the best name? but it is quite descriptive :)

lib/types/multipart.js

kibertoad · 2021-12-01T21:15:35Z

@Uzlopak Yes, I think this is the exactly right solution.

Uzlopak · 2021-12-01T21:35:06Z

@kibertoad
@cyantree

added documentation. anything missing?

test/types-multipart.spec.js

kibertoad · 2021-12-01T21:45:42Z

LGTM.
@cyantree What do you think?

cyantree

My comments don't affect the overall functionality but only code quality and code coverage.

lib/main.d.ts

test/types-multipart.spec.js

Uzlopak · 2021-12-01T21:56:45Z

Done

cyantree · 2021-12-01T22:02:41Z

Thanks @Uzlopak and @kibertoad for your work and time. 👍

kibertoad · 2021-12-01T22:05:44Z

Thank you for for bearing with us through all of this :D.
and also for inspiring this whole commotion in the first place!

Uzlopak · 2021-12-01T22:15:39Z

It was quite a ride and it was good that we worked together to get the best solution possible :)

I really love this hands-on mentality

kibertoad · 2021-12-01T22:26:22Z

@Uzlopak I'll work on benchmarks now. Then we can hopefully land all outstanding PRs and publish 1.0.0 :)

handle blobs as fields

b3e0d8e

Uzlopak mentioned this pull request Dec 1, 2021

fix empty filenames are not considered a field (#26) #38

Closed

4 tasks

Uzlopak requested a review from kibertoad December 1, 2021 16:18

kibertoad reviewed Dec 1, 2021

View reviewed changes

lib/types/multipart.js Show resolved Hide resolved

Uzlopak added 2 commits December 1, 2021 21:58

proposal

5f97a76

linting

069e88e

kibertoad reviewed Dec 1, 2021

View reviewed changes

lib/types/multipart.js Show resolved Hide resolved

kibertoad reviewed Dec 1, 2021

View reviewed changes

lib/types/multipart.js Outdated Show resolved Hide resolved

revert change

10a8670

remove Migrating.md

47c6c8f

add documentation

c8bd5ba

kibertoad reviewed Dec 1, 2021

View reviewed changes

test/types-multipart.spec.js Show resolved Hide resolved

modify unit test

b2b67e2

cyantree approved these changes Dec 1, 2021

View reviewed changes

lib/main.d.ts Outdated Show resolved Hide resolved

test/types-multipart.spec.js Show resolved Hide resolved

Uzlopak added 2 commits December 1, 2021 22:47

fix typo

536a095

add test for the other way round

781897e

cyantree mentioned this pull request Dec 1, 2021

Handle empty filenames as fields instead of files #26

Closed

2 tasks

Uzlopak added 2 commits December 1, 2021 22:52

modify typings

f5f4c95

forgot to remove one undefined

f4e47e1

kibertoad approved these changes Dec 1, 2021

View reviewed changes

Uzlopak merged commit 82a9a50 into fastify:master Dec 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle blobs as fields #53

handle blobs as fields #53

Uzlopak commented Dec 1, 2021 •

edited

Loading

climba03003 commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

cyantree commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 •

edited

Loading

kibertoad commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree left a comment

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

handle blobs as fields #53

handle blobs as fields #53

Conversation

Uzlopak commented Dec 1, 2021 • edited Loading

Checklist

climba03003 commented Dec 1, 2021 • edited Loading

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree commented Dec 1, 2021 • edited Loading

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

cyantree commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 • edited Loading

kibertoad commented Dec 1, 2021 • edited Loading

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

cyantree left a comment

Choose a reason for hiding this comment

Uzlopak commented Dec 1, 2021

cyantree commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 •

edited

Loading

climba03003 commented Dec 1, 2021 •

edited

Loading

cyantree commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021 •

edited

Loading

kibertoad commented Dec 1, 2021 •

edited

Loading