[ML] Data Visualizer to accept data without timestamp #60196

kiju98 · 2020-03-14T15:07:48Z

Kibana version: 7.6.1

Describe the feature:
Currently Data Visualizer produces the following error when we try to load to upload a data file without timestamp:

File could not be read
[illegal_argument_exception] Could not find a timestamp in the sample provided

It would be more helpful if Data Visualizer accepts data without timestamp.

Describe a specific use case for the feature:
It was enough for Data Visualizer to load only data with timestamp because Elastic Machine Learning (anomaly detection) only handled data with timestamp, but 7.6 introduced other features like classification which does not require timestamp and I think it would be helpful if Data Visualizer accepts data without timestamp.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-03-16T11:26:36Z

Pinging @elastic/ml-ui (:ml)

droberts195 · 2020-03-16T11:38:44Z

It does accept data without a timestamp, providing the data is in a highly structured format like NDJSON, CSV, TSV, semi-colon separated, etc.

It needs a timestamp for semi-structured log data because the rule for "what is the first line of each message" is "the line with the timestamp on".

If you think your data was CSV or some other delimited format then the real question here is, what was the thing that made the file structure finder think it was not possible to import as CSV? Sending the file direct to the backend find_file_structure endpoint and using the ?explain option will give more insight into this.

There are a number of things that could come out of this if the file was CSV:

It would be useful if the UI error message included the explanation from the backend endpoint
The backend endpoint should be more tolerant about detecting CSV when the format is explicitly overridden - see [ML] More tolerant delimited file parsing when structure is overridden elasticsearch#38890
The UI should let you give hints by setting overrides if the initial import fails - see [ML] File Data Viz should allow retry with overrides when initial analysis fails #38868 - it's not what the title of the issue says but would be covered by:

It would be nice if the user was able to enter overrides after an error on the initial analysis, to enable them to import a file in situations when giving the structure analysis hints would allow it to succeed.

kiju98 · 2020-03-16T12:31:06Z

Thank you @droberts195
The data file was CSV.
I tried the find_file_structure endpoint and the result was:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Could not find a timestamp in the sample provided"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Could not find a timestamp in the sample provided",
    "suppressed" : [
      {
        "type" : "exception",
        "reason" : "Explanation so far:\n[Using character encoding [UTF-8], which matched the input with [100%] confidence]\n[Not NDJSON because there was a parsing exception: [Unrecognized token 'korean_title': was expecting ('true', 'false' or 'null') at [Source: \"korean_title,title,year,country,length,genre,like,director,company\"; line: 1, column: 13]]]\n[Not XML because there was a parsing exception: [ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.]]\n[Not CSV because row [82] has a different number of fields to the first row: [9] and [8]]\n[Not TSV because the first row has fewer than [2] fields: [1]]\n[Not semicolon delimited values because the first row has fewer than [4] fields: [1]]\n[Not vertical line delimited values because the first row has fewer than [5] fields: [1]]\n[Deciding sample is text]\n"
      }
    ]
  },
  "status" : 400
}

The data file is
movies.zip

I think the error was due to the missing values in the company field. I deleted the company fields and successfully loaded the CSV file. The modified CSV file is
movies2.zip

I hope Data Visualizer would be more generous to allow missing values and agree that it would be helpful if we can see the explanation from Kibana.

droberts195 · 2020-03-16T12:37:46Z

Yes, so Not CSV because row [82] has a different number of fields to the first row: [9] and [8] is the relevant part of the explanation.

It's hard for the file structure finder to ignore discrepancies in numbers of CSV fields per row because then there could be a lot of semi-structured text log files that would get misdetected as CSV.

However, if the format was allowed to be overridden even when the initial analysis fails then elastic/elasticsearch#38890 would help because if you explicitly said your file was CSV then differences in numbers of fields per line could be treated as some lines having blanks at the end.

kiju98 · 2020-03-18T02:16:46Z

Sounds great! Let me close it in favor of elastic/elasticsearch#38890

kiju98 added Feature:File and Index Data Viz ML file and index data visualizer v7.6.1 labels Mar 14, 2020

peteharverson added the :ml label Mar 16, 2020

peteharverson changed the title ~~Data Visualizer to accept data without timestamp~~ [ML] Data Visualizer to accept data without timestamp Mar 16, 2020

peteharverson added v7.7.0 and removed v7.6.1 labels Mar 16, 2020

kiju98 closed this as completed Mar 18, 2020

droberts195 mentioned this issue Apr 27, 2020

[ML] Allow a certain number of ill-formatted rows when delimited format is specified elastic/elasticsearch#55735

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Data Visualizer to accept data without timestamp #60196

[ML] Data Visualizer to accept data without timestamp #60196

kiju98 commented Mar 14, 2020 •

edited by peteharverson

Loading

elasticmachine commented Mar 16, 2020

droberts195 commented Mar 16, 2020

kiju98 commented Mar 16, 2020

droberts195 commented Mar 16, 2020

kiju98 commented Mar 18, 2020 •

edited

Loading

[ML] Data Visualizer to accept data without timestamp #60196

[ML] Data Visualizer to accept data without timestamp #60196

Comments

kiju98 commented Mar 14, 2020 • edited by peteharverson Loading

elasticmachine commented Mar 16, 2020

droberts195 commented Mar 16, 2020

kiju98 commented Mar 16, 2020

droberts195 commented Mar 16, 2020

kiju98 commented Mar 18, 2020 • edited Loading

kiju98 commented Mar 14, 2020 •

edited by peteharverson

Loading

kiju98 commented Mar 18, 2020 •

edited

Loading