Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csv fails to parse, in my case, on a long string, that reoccur #1434

Closed
smoothdeveloper opened this issue Apr 1, 2022 · 5 comments
Closed

Comments

@smoothdeveloper
Copy link
Contributor

I have this .csv file that fails the parser somehow, despite it looks fine trying to open in excel.

I haven't yet checked other Csv parsers on the file, but wanted to report it, and maybe it can be fixed in the library?
outagesshort.csv

type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv" >

fsharp.data.csvparseerror.fsx(17,12): error FS3033: The type provider 'ProviderImplementation.CsvProvider' reported an error: Cannot read sample CSV from 'c:\tmp\outagesshort.csv': Couldn't parse row 218 according to schema: Expected 21 columns, got 11

In the csv file, the value at column 11 is the same as previous line, which parses correctly, so it feels like some state isn't closed/reset, but maybe it is the csv file which is "malformed".

@nikoyak
Copy link
Contributor

nikoyak commented Apr 26, 2022

type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv", InferRows = 0 >
// or, e.g.
// type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv", InferRows = 255 >

This file contains multiline rows, and there is the bug/feature in CsvProvider: InferRow considers lines first, not rows.

@smoothdeveloper
Copy link
Contributor Author

@nikoyak thanks, do you mean I should try another thing than InferRow for my scenario?

Sorry for not pushing more on assessing the underlying issue, I am now seeing another occurence of parsing, that fails, likely on mis escaped character (pound one).

It is likely the csv producer which has some issue with conformance, the CsvHelper library fails to parse beyond same record FSharp.Data csv parser would report.

SkipErrors works enough to not just drop the whole file processing, but do you know if it is possible to hook into events when row fails to parse?

This would help reporting the skipped ones in processing code using FSharp.Data csv parser.

If I can get more details about the whole thing, I'll come back to it.

@smoothdeveloper
Copy link
Contributor Author

Closing this, as I have identified another underlying issue, even before the parser in this library takes place.

@smoothdeveloper
Copy link
Contributor Author

smoothdeveloper commented May 24, 2022

@nikoyak do you mind giving me more pointers (code, issues, PR, farther technical details) to the multiline issues you are referring to?

@nikoyak
Copy link
Contributor

nikoyak commented May 26, 2022

@smoothdeveloper

do you know if it is possible to hook into events when row fails to parse?

if not hasCorrectNumberOfColumns then
// Ignore rows with different number of columns when ignoreErrors is set to true
if not ignoreErrors then
let lineNumber = if hasHeaders then lineNumber else lineNumber + 1
failwithf "Couldn't parse row %d according to schema: Expected %d columns, got %d" lineNumber numberOfColumns untypedRow.Length
else

do you mind giving me more pointers (code, issues, PR, farther technical details) to the multiline issues you are referring to?

#1439

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants