-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Combination of influencers results in empty anomaly chart #18547
Comments
Original comment by @sophiec20: Can you please confirm what's in the results document for I believe we convert the array of terms into a comma sep string. Once a string, then we loose knowledge that the field was ever an array of values, and we are stuck with the order we found them in. We should document that this isn't supported. (It was a known issue at one point). We should also re-raise a ticket for the back-end to consider if and how we act on this. @droberts195 do you recall any more background? |
Original comment by @droberts195: We definitely convert arrays in input JSON to comma separated lists for analysis. This won't work at all with reverse search. It dates back to the days of the standalone Engine API, i.e. when we weren't running as part of the cluster containing the data. Data had to be posted to our API and the differing semantics of how ES handles arrays was not important. To make reverse search work we could expand the input document into one record to be sent to the analytics per array element. However, this may only make sense in certain cases. In other cases it may be correct that the complete contents of the array is what you want to partition on. This needs more thought. |
Original comment by @walterra: I also noted that running jobs with arrays of terms tends to bring down my local ES instance (initial analysis doesn't finish, real-time jobs not able to continue, job deletion sometimes even not possible). I try to come up with a minimal dataset + job to reproduce. |
Original comment by @walterra: @sophiec20 I can confirm that the array ends up as a comma separated string inside the |
Original comment by @walterra: Here's an example: {
"_index": ".ml-anomalies-shared",
"_type": "doc",
"_id": "planets-array-pop-1508_record_1515159660000_60_0_-951675928_14",
"_score": 9.896803,
"_source": {
"job_id": "planets-array-pop-1508",
"result_type": "record",
"probability": 0.024519020067470112,
"record_score": 0.03533323,
"initial_record_score": 5.803119136956459,
"bucket_span": 60,
"detector_index": 0,
"is_interim": false,
"timestamp": 1515159660000,
"function": "count",
"function_description": "count",
"over_field_name": "planets",
"over_field_value": "Jupiter,Uranus",
"causes": [
{
"probability": 0.024519020067470112,
"function": "count",
"function_description": "count",
"typical": [
1.0490282039676364
],
"actual": [
3
],
"over_field_name": "planets",
"over_field_value": "Jupiter,Uranus"
}
],
"influencers": [
{
"influencer_field_name": "planets",
"influencer_field_values": [
"Jupiter,Uranus"
]
}
],
"planets": [
"Jupiter,Uranus"
]
}
}, |
Original comment by @walterra:
A field can be an array of terms like:
When this field is used for analysis as an over or by field, the resulting entities can be combinations of these terms. However, when you select one of these in anomaly explorer, the anomaly chart remains empty: (In the following analysis result, clicking on the "Drama" or "Comedy" swimlane brings up correct anomaly charts, it just doesn't work for the one's with multiple terms in one swimlane)
To fix this I'd like to clarify:
The text was updated successfully, but these errors were encountered: