Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] [ML] Allow a certain number of ill-formatted rows when delimited format is specified (#55735) #55944

Merged
merged 1 commit into from
Apr 29, 2020

Conversation

benwtrent
Copy link
Member

Backports the following commits to 7.x:

…at is specified (elastic#55735)

While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.).

This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format.

This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means.

related to elastic#38890
@benwtrent benwtrent added :ml Machine learning backport labels Apr 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent merged commit edd049f into elastic:7.x Apr 29, 2020
@benwtrent benwtrent deleted the backport/7.x/pr-55735 branch April 29, 2020 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants