You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading a UTF-8 CSV and attempting to upload it with needle via a multipart POST can cause non-ASCII characters inside the CSV data to be replaced by other characters.
Steps to Reproduce:
# node --version
v14.18.1
'use strict';constchildproc=require('child_process');constneedle=require('needle');constnet=require('net');asyncfunctionpostAndCapture(buffer,content_type){constport=49152+Math.floor(Math.random()*16384);constproxy=net.createServer();awaitnewPromise((res,rej)=>{proxy.on('error',rej);proxy.listen(port,res);});proxy.on('connection',sock=>{constbuffs=[];letpending;sock.on('data',x=>{buffs.push(x);if(!pending)pending=setImmediate(()=>sock.end('HTTP/1.1 200 OK\r\n\r\n'));});sock.on('close',()=>{constproc=childproc.spawn('hexdump',['-C'],{stdio: ['pipe','inherit','inherit']});proc.stdin.end(Buffer.concat(buffs));});});awaitneedle('post',`http://localhost:${port}/`,{file: {
buffer,
content_type,filename: newDate().getTime()+'.csv'}},{multipart: true,});proxy.close();}(async()=>{// File content as a buffer, as it would be read directly from a file.constcsvFile=Buffer.from('77u/Ikh5cGhlbiIsIkVtIERhc2giDQoiLSIsIuKAlCINCg==','base64');// Send with CSV type, as some web API might require.// -> needle heuristically detects a "text" type, tries to re-encode the// CSV payload, causing the UTF-8 em dash to be replaced by an ASCII// control character, and corrupting the payload.awaitpostAndCapture(csvFile,'text/csv');// Sending with application/octet-stream works around the issue, but prevents// us from sending the correct Content-Type, which might not work for all use-cases.awaitpostAndCapture(csvFile,'application/octet-stream');})();
Expected
Changing the Content-Type of the uploaded file should not affect the file content.
The em dash is encoded as 0x14, an obscure ASCII control character, which apparently chokes some CSV parsers.
Uploading the CSV file as application/octet-stream may work as a workaround for some APIs but may not work in all cases, e.g. where a provider accepts multiple formats and uses the Content-Type header to actually differentiate which parser to use.
According to https://github.com/tomas/needle/blob/master/lib/multipart.js#L45, needle tries to heuristically determine if it should process and re-encode the payload data based on the content-type; there is apparently no way to instruct it to skip this re-encoding and send the data exactly as-is while still using a text content-type.
The text was updated successfully, but these errors were encountered:
I see. So what should we do in this case? Allow passing something like multipart: 'raw' to skip re-encoding, or include some other method of preventing these weird conversions to happen?
Thanks for the very detailed bug report, by the way!
multipart: 'raw' could be technically workable, but feels very unsatisfying.
If you receive a binary Buffer for that part (or a file, which readFile then turns into a Buffer), wouldn't it make sense to just always send that as binary without re-encoding? In that case, I might just do something like this:
That seemed to work for the test case in this issue, at least: I get the content-type I want, a binary transfer-encoding, and the correct UTF-8 bytes, in both cases.
mciasuen
added a commit
to mciasuen/tomas-needle
that referenced
this issue
May 9, 2022
Reading a UTF-8 CSV and attempting to upload it with needle via a multipart POST can cause non-ASCII characters inside the CSV data to be replaced by other characters.
Steps to Reproduce:
# node --version v14.18.1
Expected
Observed
With application/octet-stream...
...the relevant section of the hex dump shows:
The em dash is encoded as 0x80 0x94, a valid UTF-8 code sequence.
With text/csv...
...the relevant section of the hex dump shows:
The em dash is encoded as 0x14, an obscure ASCII control character, which apparently chokes some CSV parsers.
Uploading the CSV file as application/octet-stream may work as a workaround for some APIs but may not work in all cases, e.g. where a provider accepts multiple formats and uses the Content-Type header to actually differentiate which parser to use.
According to https://github.com/tomas/needle/blob/master/lib/multipart.js#L45, needle tries to heuristically determine if it should process and re-encode the payload data based on the content-type; there is apparently no way to instruct it to skip this re-encoding and send the data exactly as-is while still using a text content-type.
The text was updated successfully, but these errors were encountered: