Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to CSV encoding/line endings/dialect inference #432

Merged
merged 5 commits into from
Apr 7, 2021

Conversation

mildbyte
Copy link
Contributor

@mildbyte mildbyte commented Apr 7, 2021

  • Autodetect the encoding using chardet
  • Add more configuration to the CSV plugin for: encoding, dialect (e.g. "excel"), sample size for inference
  • Bump the sample size to 64KB to have a better chance of inferring the dialect for wider tables
  • Autogenerate column names for unnamed columns
  • Handle Mac-style and other newlines (universal newlines mode)

mildbyte added 5 commits April 7, 2021 12:59
…to a separate module. Get the CSV plugin to also infer the file's encoding and get it to handle Windows line endings properly. Also make the sample size for inference customizable.
….g. col_1) since PG doesn't like empty column names. Add an integration test for the end-to-end querying + import through FDW with an unnamed column.
@mildbyte mildbyte merged commit 09e0f56 into master Apr 7, 2021
@mildbyte mildbyte deleted the feature/csv-encoding-inference branch April 7, 2021 15:24
mildbyte added a commit that referenced this pull request Apr 7, 2021
  * Fixes to the Snowflake data source (#421)
  * Add automatic encoding, newline and dialect inference to the CSV data source (#432)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant