-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange Error On Some CSV Inputs #58
Comments
The problem that was described in the issue that was filed was that the data that was provided did not have the proper fields that the tool was set to look for (SSN specifically). This is a problem for how the tool works on the back-end because the tool uses a strategy of using both deterministic and probabilistic rules to find duplicates and it can’t do that if none of the data satisfies any of the deterministic rules. To solve this, the user must change the deterministic rules so that they match at least 5 data points in order to be accepted. This is done through editing |
Describe the bug
Ran into out of range error when using tool with this CSV input:
This input has no SSN or truth value which may have something to do with the error.
However when i run the script:
python3.11 cli/ecqm_dedupe.py dedupe-data --fmt CSV /tmp/x.csv /tmp/out.csv
Things get very bad (please note that running with the same data i initially sent is working fine) :
To Reproduce
Use the above input with this CLI call:
python3.11 cli/ecqm_dedupe.py dedupe-data --fmt CSV /tmp/x.csv /tmp/out.csv
Expected behavior
The CLI should output the deduplicated data normally.
Actual behavior
The tool throws an out of range error
The text was updated successfully, but these errors were encountered: