Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema: Fix bug with enum #163

Merged
merged 3 commits into from
Feb 10, 2022
Merged

Conversation

mhuang74
Copy link
Contributor

@mhuang74 mhuang74 commented Feb 10, 2022

Should only add null to Enum list when it's not empty. Enum list would be empty when column has cardinality larger than threshold.

Integration test now covers this case.

$ head -50000 NYC_311_SR_2010-2020-sample-1M.csv > NYC-short.csv
$ qsvlite index NYC-short.csv
$ time qsvlite schema NYC-short.csv --value-constraints --enum-threshold=25
Schema written to NYC-short.csv.schema.json

real	1m10.350s
user	2m59.430s
sys	0m3.241s
$ time qsvlite validate NYC-short.csv NYC-short.csv.schema.json 
[00:03:45] [==================== 100% validated 49,999 records.] (222/sec)
0 out of 49,999 records invalid.

real	3m45.273s
user	3m44.656s
sys	0m0.360s

@mhuang74 mhuang74 mentioned this pull request Feb 10, 2022
@jqnatividad jqnatividad merged commit 6482a0e into dathere:master Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants