[ML] Combination of influencers results in empty anomaly chart #18547

elasticmachine · 2018-01-04T14:27:50Z

Original comment by @walterra:

A field can be an array of terms like:

{
  "genre": [
      "Action",
      "Adventure",
      "Drama",
      "Horror",
      "Sci-Fi"
    ]
}

When this field is used for analysis as an over or by field, the resulting entities can be combinations of these terms. However, when you select one of these in anomaly explorer, the anomaly chart remains empty: (In the following analysis result, clicking on the "Drama" or "Comedy" swimlane brings up correct anomaly charts, it just doesn't work for the one's with multiple terms in one swimlane)

To fix this I'd like to clarify:

Is it intentional that combinations of terms can show up as single rows in the swimlanes, for me this was kind of unexpected?
If yes, should the combination of terms be shown in separate anomaly charts or a single one?

elasticmachine · 2018-01-04T15:44:09Z

Original comment by @sophiec20:

Can you please confirm what's in the results document for influencer_field_value and over_field_value.

I believe we convert the array of terms into a comma sep string. Once a string, then we loose knowledge that the field was ever an array of values, and we are stuck with the order we found them in.

We should document that this isn't supported. (It was a known issue at one point). We should also re-raise a ticket for the back-end to consider if and how we act on this.

@droberts195 do you recall any more background?

elasticmachine · 2018-01-04T18:18:42Z

Original comment by @droberts195:

We definitely convert arrays in input JSON to comma separated lists for analysis. This won't work at all with reverse search. It dates back to the days of the standalone Engine API, i.e. when we weren't running as part of the cluster containing the data. Data had to be posted to our API and the differing semantics of how ES handles arrays was not important.

To make reverse search work we could expand the input document into one record to be sent to the analytics per array element. However, this may only make sense in certain cases. In other cases it may be correct that the complete contents of the array is what you want to partition on.

This needs more thought.

elasticmachine · 2018-01-05T10:24:54Z

Original comment by @walterra:

I also noted that running jobs with arrays of terms tends to bring down my local ES instance (initial analysis doesn't finish, real-time jobs not able to continue, job deletion sometimes even not possible). I try to come up with a minimal dataset + job to reproduce.

elasticmachine · 2018-01-05T14:17:25Z

Original comment by @walterra:

@sophiec20 I can confirm that the array ends up as a comma separated string inside the .ml-anomalies-shared index for the influencer_field_value and over_field_value fields.

elasticmachine · 2018-01-05T14:18:38Z

Original comment by @walterra:

Here's an example:

    {
        "_index": ".ml-anomalies-shared",
        "_type": "doc",
        "_id": "planets-array-pop-1508_record_1515159660000_60_0_-951675928_14",
        "_score": 9.896803,
        "_source": {
          "job_id": "planets-array-pop-1508",
          "result_type": "record",
          "probability": 0.024519020067470112,
          "record_score": 0.03533323,
          "initial_record_score": 5.803119136956459,
          "bucket_span": 60,
          "detector_index": 0,
          "is_interim": false,
          "timestamp": 1515159660000,
          "function": "count",
          "function_description": "count",
          "over_field_name": "planets",
          "over_field_value": "Jupiter,Uranus",
          "causes": [
            {
              "probability": 0.024519020067470112,
              "function": "count",
              "function_description": "count",
              "typical": [
                1.0490282039676364
              ],
              "actual": [
                3
              ],
              "over_field_name": "planets",
              "over_field_value": "Jupiter,Uranus"
            }
          ],
          "influencers": [
            {
              "influencer_field_name": "planets",
              "influencer_field_values": [
                "Jupiter,Uranus"
              ]
            }
          ],
          "planets": [
            "Jupiter,Uranus"
          ]
        }
      },

elasticmachine added :ml Feature:ml-results legacy - do not use discuss labels Apr 25, 2018

walterra mentioned this issue Apr 25, 2018

[ML] Anomaly Explorer Enhancements #18553

Open

42 tasks

sophiec20 added Feature:Anomaly Detection ML anomaly detection and removed Feature:ml-results legacy - do not use labels Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Combination of influencers results in empty anomaly chart #18547

[ML] Combination of influencers results in empty anomaly chart #18547

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 5, 2018

elasticmachine commented Jan 5, 2018

elasticmachine commented Jan 5, 2018

[ML] Combination of influencers results in empty anomaly chart #18547

[ML] Combination of influencers results in empty anomaly chart #18547

Comments

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 4, 2018

elasticmachine commented Jan 5, 2018

elasticmachine commented Jan 5, 2018

elasticmachine commented Jan 5, 2018