Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Combination of influencers results in empty anomaly chart #18547

Open
elasticmachine opened this issue Jan 4, 2018 · 5 comments
Open

[ML] Combination of influencers results in empty anomaly chart #18547

elasticmachine opened this issue Jan 4, 2018 · 5 comments

Comments

@elasticmachine
Copy link
Contributor

Original comment by @walterra:

A field can be an array of terms like:

{
  "genre": [
      "Action",
      "Adventure",
      "Drama",
      "Horror",
      "Sci-Fi"
    ]
}

When this field is used for analysis as an over or by field, the resulting entities can be combinations of these terms. However, when you select one of these in anomaly explorer, the anomaly chart remains empty: (In the following analysis result, clicking on the "Drama" or "Comedy" swimlane brings up correct anomaly charts, it just doesn't work for the one's with multiple terms in one swimlane)

image

To fix this I'd like to clarify:

  • Is it intentional that combinations of terms can show up as single rows in the swimlanes, for me this was kind of unexpected?
  • If yes, should the combination of terms be shown in separate anomaly charts or a single one?
@elasticmachine
Copy link
Contributor Author

Original comment by @sophiec20:

Can you please confirm what's in the results document for influencer_field_value and over_field_value.

I believe we convert the array of terms into a comma sep string. Once a string, then we loose knowledge that the field was ever an array of values, and we are stuck with the order we found them in.

We should document that this isn't supported. (It was a known issue at one point). We should also re-raise a ticket for the back-end to consider if and how we act on this.

@droberts195 do you recall any more background?

@elasticmachine
Copy link
Contributor Author

Original comment by @droberts195:

We definitely convert arrays in input JSON to comma separated lists for analysis. This won't work at all with reverse search. It dates back to the days of the standalone Engine API, i.e. when we weren't running as part of the cluster containing the data. Data had to be posted to our API and the differing semantics of how ES handles arrays was not important.

To make reverse search work we could expand the input document into one record to be sent to the analytics per array element. However, this may only make sense in certain cases. In other cases it may be correct that the complete contents of the array is what you want to partition on.

This needs more thought.

@elasticmachine
Copy link
Contributor Author

Original comment by @walterra:

I also noted that running jobs with arrays of terms tends to bring down my local ES instance (initial analysis doesn't finish, real-time jobs not able to continue, job deletion sometimes even not possible). I try to come up with a minimal dataset + job to reproduce.

@elasticmachine
Copy link
Contributor Author

Original comment by @walterra:

@sophiec20 I can confirm that the array ends up as a comma separated string inside the .ml-anomalies-shared index for the influencer_field_value and over_field_value fields.

@elasticmachine
Copy link
Contributor Author

Original comment by @walterra:

Here's an example:

    {
        "_index": ".ml-anomalies-shared",
        "_type": "doc",
        "_id": "planets-array-pop-1508_record_1515159660000_60_0_-951675928_14",
        "_score": 9.896803,
        "_source": {
          "job_id": "planets-array-pop-1508",
          "result_type": "record",
          "probability": 0.024519020067470112,
          "record_score": 0.03533323,
          "initial_record_score": 5.803119136956459,
          "bucket_span": 60,
          "detector_index": 0,
          "is_interim": false,
          "timestamp": 1515159660000,
          "function": "count",
          "function_description": "count",
          "over_field_name": "planets",
          "over_field_value": "Jupiter,Uranus",
          "causes": [
            {
              "probability": 0.024519020067470112,
              "function": "count",
              "function_description": "count",
              "typical": [
                1.0490282039676364
              ],
              "actual": [
                3
              ],
              "over_field_name": "planets",
              "over_field_value": "Jupiter,Uranus"
            }
          ],
          "influencers": [
            {
              "influencer_field_name": "planets",
              "influencer_field_values": [
                "Jupiter,Uranus"
              ]
            }
          ],
          "planets": [
            "Jupiter,Uranus"
          ]
        }
      },

@sophiec20 sophiec20 added Feature:Anomaly Detection ML anomaly detection and removed Feature:ml-results legacy - do not use labels Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants