Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] 'prediction_field_name' accepts values that clash with already existing fields #48808

Closed
dolaru opened this issue Nov 1, 2019 · 3 comments
Labels
>bug :ml Machine learning

Comments

@dolaru
Copy link
Member

dolaru commented Nov 1, 2019

Found in 7.5.0

Currently, the value of the prediction_field_name parameter for analytics jobs accepts any string value as a field name.

This means the user can put in a field name that clashes with something we're already using under the ml key when writing results (e.g.is_training). If a user does so, the analytics job fails due to mapping conflicts at the result writing stage.

This kind of situation can be avoided by failing during job creation if prediction_field_name matches any child key from the ml field mapping we use for the results index.

@dolaru dolaru added >bug :ml Machine learning labels Nov 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@przemekwitek przemekwitek self-assigned this Nov 20, 2019
@przemekwitek
Copy link
Contributor

This is good catch!

Currently, for classification there are three such fields: prediction_probability, is_training and top_classes.
For regression there is one: is_training.

Please note that currently (it may change in the future) we do not have an upfront defined mapping for the results. Therefore, if we want to fail the job creation, we'd have to hardcode the fields listed above in the job creation code (Java). This introduces duplication between C++ and Java as now every new field emitted by C++ code would have to be copied to the field name blacklist in Java.
@dimitris-athanasiou: Please LMK if you think such a duplication is acceptable (at least as a short-term remediated for this bug).

@przemekwitek
Copy link
Contributor

przemekwitek commented Dec 11, 2019

As of elastic/ml-cpp#861 the check is done in C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants