-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Improve sampling and normalization of population chart. #24402
[ML] Improve sampling and normalization of population chart. #24402
Conversation
Pinging @elastic/ml-ui |
💚 Build Succeeded |
{ | ||
random_score: { | ||
// static seed to get same randomized results on every request | ||
seed: 'ml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you now need to use a field
parameter here as well as seed
to get reproducible results - see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-random
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, I was working off some outdated docs. Push an update in 43ced43.
This LGTM overall. 👍 I'm just waiting on responses to Pete's comments since I have less context on that bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Build Failed |
retest |
💚 Build Succeeded |
…#24402) This optimizes how contextual data is fetched for the population analysis chart.
Summary
This PR optimizes how contextual data is fetched for the population analysis chart. The previous structure of nested aggregations to get the data was like this:
This is now changed to the following structure:
bool
query withmust
clauses only, all documents had the same score.count
andsum
. In contrast to functions likemean
, the results of the former could be heavily skewed because of the sampling. The normalization adjusts the values to take into account the total amount of documents without sampling.The screenshots above show results before (left) and after (right) the optimization.
50000
, the examples above are just to give you an impression how the sampling and normalization work before and after.Checklist
Checklist not applicable (existing tests pass, no DOM changes).
Part of #21163.