-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV sniffing falsely detects space as a delimiter #88843
Comments
Let's consider the following CSV content: "a|b\nc| 'd\ne|' f". The real delimiter in this case is '|' character while ' ' is sniffed. Find verbose example attached. Problem lays in csv.py file in the following code:
What makes matches non-empty and farther processing happens with delimiter falsely set to ' '. |
Test sample:
Actual output:
Expected output:
|
I think changing |
Changing sniffer logic is risky because it risks breaking existing code that relies on the current predictions. FWIW, in your example, the sniffer gets the desired result if given a delimiter hint: >>> s = "a|b\nc| 'd\ne|' f"
>>> pprint.pp(dict(vars(Sniffer().sniff(s, '|'))))
{'__module__': 'csv',
'_name': 'sniffed',
'lineterminator': '\r\n',
'quoting': 0,
'__doc__': None,
'doublequote': False,
'delimiter': '|',
'quotechar': "'",
'skipinitialspace': False} |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: