-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latin1 supplement separator not working #195
Comments
I ran into the same issue and wasted a few working days on this. Hope it will help someone in future. Root cause of problemthe character code of JavaScript encodes strings as > const b1 = Buffer.from('¬')
undefined
> b1
<Buffer c2 ac>
> const b2 = Buffer.from([172])
undefined
> b2
<Buffer ac>
> const b3 = Buffer.from([0xac])
undefined
> b3
<Buffer ac> The parser only supports one byte charater as separator, and it is effectively using c2 to split. So, it doesn't matter if you actually used
Solutionconst NOT_SIGN = [0xac];
// or const NOT_SIGN = [172];
// or const NOT_SIGN = new Uint8Array([0xac])
const parser = () =>
csv({
// @ts-ignore the source code is not typed correctly. We can pass a single byte in an array as well.
separator: NOT_SIGN,
}); Why does this work?This works because the source code invokes type Uint8ArrayLength1 = Uint8Array & { length: 1 };
function createUint8ArrayLength1(value: number): Uint8ArrayLength1 {
const array = new Uint8Array([value]);
if (array.length !== 1) {
throw new Error("Uint8Array must have a length of 1");
}
return array as Uint8ArrayLength1;
} Debug from the source codeIf you have no control over what encoding the csv was saved in, or if you are not sure if your team used ascii, latin1, utf-8, or utf-16, just change the source code in your Alternatively, just use iconv as suggested by the doc and pipe your file stream from its original encoding into utf-8 and handle it from there. If you've read this far, you already have the bits and pieces to figure out the appropriate actions for your use case. Jest testJest test can be something like below, just replace my section symbol with your negation sign. import { Readable, Writable } from 'node:stream';
import parser from '../util/csvParser';
import { pipeline } from 'stream/promises';
const mockLogger = {
log: jest.fn(),
error: jest.fn(),
warn: jest.fn(),
debug: jest.fn(),
verbose: jest.fn(),
};
describe('csvParser using §', () => {
it('should parse CSV with default options given csv with row number', async () => {
expect.assertions(1);
const inputString =
' §header1§header2\n1§value1§value2\n2 §value3§value4';
const hexArray = [];
for (let i = 0; i < inputString.length; i++) {
const hexValue = inputString.charCodeAt(i);
hexArray.push(
`0x${hexValue.toString(16).toUpperCase().padStart(2, '0')}`,
);
}
const csvData = Buffer.from(hexArray);
const output = new Writable({
objectMode: true,
write(chunk, encoding, callback) {
results.push(chunk);
callback();
},
});
const input = new Readable({
read() {
this.push(csvData);
this.push(null);
},
});
const results = [];
await pipeline(input, parser(mockLogger), output);
expect(results).toEqual([
{ HEADER1: 'value1', HEADER2: 'value2' },
{ HEADER1: 'value3', HEADER2: 'value4' },
]);
});
it('should parse CSV with default options given csv without row number', async () => {
expect.assertions(1);
const inputString = 'header1§header2\nvalue1§value2\nvalue3§value4';
const hexArray = [];
for (let i = 0; i < inputString.length; i++) {
const hexValue = inputString.charCodeAt(i);
hexArray.push(
`0x${hexValue.toString(16).toUpperCase().padStart(2, '0')}`,
);
}
const csvData = Buffer.from(hexArray);
const output = new Writable({
objectMode: true,
write(chunk, encoding, callback) {
results.push(chunk);
callback();
},
});
const input = new Readable({
read() {
this.push(csvData);
this.push(null);
},
});
const results = [];
await pipeline(input, parser(mockLogger), output);
expect(results).toEqual([
{ HEADER1: 'value1', HEADER2: 'value2' },
{ HEADER1: 'value3', HEADER2: 'value4' },
]);
});
}); |
opened a PR for this |
Expected Behavior
Using "¬" as a separator in csv file should work. Output should look like:
Actual Behavior
Parsing seems not correct. Producing something like this:
How Do We Reproduce?
Test file:
Code:
The text was updated successfully, but these errors were encountered: