-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sep does not gracefully handle improperly-escaped double quotes #150
Comments
@bryanboettcher thank you for trying out Sep and filing a detailed issue incl. not the least a full repro.
Actually, rather Sep assumes csv should be valid and does not as you observe try to or in any way handle invalid csv like incorrect quoting or custom escaping of quotes. Handling |
The absolute bonkers speed and anemic memory usage of Sep is 100% worth it to me to work around whatever you don't choose to natively support. It's amazing work.
I'm not not saying I hoped the library maintainer's statements could help me push back against the vendor's formatting ...
The previous implementation of this importer used CsvHelper, and ran about 10% the speed. When CsvHelper encountered this malformed line, it chucks an In the meantime, this was worked around via an intermediate Stream that would blindly replace any instance of |
@bryanboettcher sorry I did not reply on this. I think your work around is fine and should not be that much less performant. I guess having some form of option for a configurable callback action (delegate) in case when internal buffer grows, with option to manipulate internal data buffer would be an option. I probably don't like it since it adds a rather complicated option to the usage... but it should be possible with minimal perf impact.
Haha, I hope the vendor folded and fixed their formatting then 😉 |
@nietras (sorry, I didn't have a more graceful way to tag you)
First off, kudos to Sep for the absolute blistering speed you've accomplished with it. I understand that in many cases, this speed was gained by intentional (or omitted) checks around the correctness of a CSV. At the same time, there are a few existing cases where Sep's correctness can be tuned (such as DisableColumnCountCheck).
A vendor has a process that occasionally generates a malformed CSV. We will get a row that's otherwise properly escaped, but they are leaving an improper backslashed-doublequote in the string field, like so:
,"XXX\" YYYY",
. Whenever Sep hits this data, we soon after get the 16MB buffer overrun exception.Our vendor is aware, but no ETA for fixing their process. Is there a mechanism now or in the future for Sep to either blindly remove this character, or more gracefully handle it? Using
DisableQuotesParsing
is insufficient in this case, because many of our fields are (properly) quoted and escaped.I have a repro: https://github.com/bryanboettcher/SepCrash
The text was updated successfully, but these errors were encountered: