-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mangled file content when multipart-POSTing a file with a "text/*" content type #403
Comments
I see. So what should we do in this case? Allow passing something like Thanks for the very detailed bug report, by the way! |
If you receive a binary Buffer for that part (or a file, which readFile then turns into a Buffer), wouldn't it make sense to just always send that as binary without re-encoding? In that case, I might just do something like this:
That seemed to work for the test case in this issue, at least: I get the content-type I want, a binary transfer-encoding, and the correct UTF-8 bytes, in both cases. |
Reading a UTF-8 CSV and attempting to upload it with needle via a multipart POST can cause non-ASCII characters inside the CSV data to be replaced by other characters.
Steps to Reproduce:
# node --version v14.18.1
Expected
Observed
With application/octet-stream...
...the relevant section of the hex dump shows:
The em dash is encoded as 0x80 0x94, a valid UTF-8 code sequence.
With text/csv...
...the relevant section of the hex dump shows:
The em dash is encoded as 0x14, an obscure ASCII control character, which apparently chokes some CSV parsers.
Uploading the CSV file as application/octet-stream may work as a workaround for some APIs but may not work in all cases, e.g. where a provider accepts multiple formats and uses the Content-Type header to actually differentiate which parser to use.
According to https://github.com/tomas/needle/blob/master/lib/multipart.js#L45, needle tries to heuristically determine if it should process and re-encode the payload data based on the content-type; there is apparently no way to instruct it to skip this re-encoding and send the data exactly as-is while still using a text content-type.
The text was updated successfully, but these errors were encountered: