-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is it possible to return raw record even for error rows in csv-parse? #292
Comments
I did achieve this by making few changes in the code. __error(msg){
const {skip_lines_with_error} = this.options
const err = typeof msg === 'string' ? new Error(msg) : msg
if(skip_lines_with_error){
this.state.recordHasError = true
this.emit('skip', err)
this.state.recordErrors.push(err) // push the error into state
return undefined
}else{
return err
}
} When the control reaches the end of the corresponding row just emitting out all the errors along with raw record. __onRecord() {
.
.
.
if(this.state.recordHasError === true){
this.__emitRecordErrors() // emit an event with aggregated row errors
this.__resetRecord()
this.state.recordHasError = false
return
}
.
.
.
} __emitRecordErrors() {
const { raw, encoding } = this.options
this.emit('aggregatedRowError', Object.assign(
{ errors: this.state.recordErrors },
raw === true ? {raw: this.state.rawBuffer.toString(encoding)}: {}
))
this.state.recordErrors = []
} Instead of disturbing the existing skip_lines_with_error functionality, I feel its better to altogether add a new option so that nothing breaks to the existing users. Do review this and if you feel this is a valid use-case please consider this as an enhancement. BTW really appreciate you for such a wonderful library, Thanks |
Could you provide one or multiple test case reproducing what you expect, prepare them as simple as possible, it will help me to garanty I understand the case correctly. |
Input File
Parser Options
When skip_lines_with_error is set library emits a skip event from which we get to know the row number of the faulty record. We had an ask from the users that instead of just returning row numbers why not give us a csv file as output which contains row number, error reason and the corresponding raw record of all the faulty rows. Output file with faulty rows
This would save them from manually going through the huge input file to figure out the faulty rows based on row number and also they can just correct the raw record referring the error reason and re-upload it again. To create such a file we need to have access to the raw record of faulty rows, but currently library is not providing it. |
It will be shipped with the next major version which is being prepared. |
@wdavidw Thank you very much for releasing this feature.
If you run the above snippet output looks like
As you can notice in the output the raw record for row-3 is only partial. The idea that I propose is to collect all the errors in a given row and emit a new event lets say 'aggregatedRowError' at the end of the row. With this change at the time of emitting the event we would have complete raw buffer of the entire row. |
Usecase
We want to collect raw record and error reason for all the error rows and form a new CSV file out of it.
By giving this CSV file to the end user they can refer the error message and correct the rows and re-upload the same file.
Whats possible today?
'skip_lines_with_error: true' -- we are able to gather the row numbers of all the faulty records.
'raw: true' -- gives raw record only for valid rows
Current implementation emits a 'skip' event as soon as it finds an error.
Is it possible to wait for the iterator to reach the end of the row and emit the event along with raw record?
@wdavidw Do you see any other way to achieve this use-case with the current implementation?
The text was updated successfully, but these errors were encountered: