[ML] More tolerant delimited file parsing when structure is overridden #38890

droberts195 · 2019-02-14T10:11:34Z

At present the file structure finder will only detect a delimited file if all rows have the same number of columns. This is sensible when determining the structure from scratch, but when the structure has been explicitly specified as delimited using an override and the exact delimiter is also supplied it makes more sense to believe the user and try to create a structure using the specified format even if it means there are different numbers of columns per row.

Additionally, when doing timestamp format determination for delimited files it would be nice to have an options to detect a timestamp field when a small percentage of rows did not match. We could still default to requiring 100% matches but offer the option to reduce this to, say, 95%.

elasticmachine · 2019-02-14T10:11:36Z

Pinging @elastic/ml-core

…at is specified (#55735) While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.). This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format. This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means. related to #38890

…at is specified (elastic#55735) While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.). This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format. This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means. related to elastic#38890

…at is specified (#55735) (#55944) While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.). This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format. This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means. related to #38890

droberts195 · 2020-07-03T17:57:16Z

FIxed by #55735

droberts195 added >enhancement :ml Machine learning labels Feb 14, 2019

droberts195 mentioned this issue Feb 14, 2019

[ML] File Data Visualiser - Option to skip bad rows/events/documents elastic/kibana#31065

Open

droberts195 self-assigned this May 30, 2019

droberts195 mentioned this issue Mar 16, 2020

[ML] Data Visualizer to accept data without timestamp elastic/kibana#60196

Closed

droberts195 mentioned this issue Apr 15, 2020

[ML] File Data Viz should allow retry with overrides when initial analysis fails elastic/kibana#38868

Closed

benwtrent mentioned this issue Apr 24, 2020

[ML] Allow a certain number of ill-formatted rows when delimited format is specified #55735

Merged

droberts195 assigned benwtrent and unassigned droberts195 Apr 27, 2020

droberts195 closed this as completed Jul 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] More tolerant delimited file parsing when structure is overridden #38890

[ML] More tolerant delimited file parsing when structure is overridden #38890

droberts195 commented Feb 14, 2019 •

edited

Loading

elasticmachine commented Feb 14, 2019

droberts195 commented Jul 3, 2020

[ML] More tolerant delimited file parsing when structure is overridden #38890

[ML] More tolerant delimited file parsing when structure is overridden #38890

Comments

droberts195 commented Feb 14, 2019 • edited Loading

elasticmachine commented Feb 14, 2019

droberts195 commented Jul 3, 2020

droberts195 commented Feb 14, 2019 •

edited

Loading