[Transform] scripted group_by fails for continuos transform #57332

hendrikmuhs · 2020-05-29T06:51:54Z

Affected versions: 7.7 - 7.9.1 (fixed >= 7.9.2)

Continuous transform minimize the amount of updates by querying only changed buckets, the logic uses a combination of query features for that. For terms it relies on terms query.

If a script is used in group_by instead of a field, this change detection logic causes the transform to fail (after it switches to continuous mode, not directly after start but after checkpoint 1) as it expects a field:

task encountered irrecoverable failure: field name cannot be null

Because scripts offer freedom to build a bucket key, we can't use them to detect changes on them. We could construct a script query, however this would be very expensive.

Mitigation: don't use scripts in group_by together with continuous transform. Scripts in queries are very expensive, so independent of change detection it's highly recommended to not use scripts in production, but only in the development/data exploration phase.

Possible solutions

Update: We finally decided to go with option B, keeping the other ideas for documentation.

~~## A Disallow scripted group_by in continuous mode~~

~~This would be easiest, however if you have only 1 scripted group_by but n non scripted group_by's this would limit functionality unnecessary.~~

B Disable change detection for scripted group_by

This would let continuous mode do a full rerun if all group_by's are using a script. On larger scale this leads to performance problems. If there are other non-scripted group_by's or the amount of data is small this might still be an acceptable solution.

~~B.2: For this solution we could also consider using _update instead of index.~~

~~## C Implement change detection based on scripted query~~

~~I am not 100% sure this is possible. This solution would use a script query instead of a terms query. This solution might not be better than solution B: disabling change detection.~~

General solution remarks

Because data might be small A would limit functionality, I therefore tend towards B. It would be possible to disallow certain combinations like only disallow if no group_by implements change detection, but again this sounds like to much of a restriction. Instead we should warn the user about potential problems. Solution C might be good in the long run, but takes significant more time to verify, implement and test, so B seems to be the best short term solution.

The user that reported the issue made a workaround for #48243, therefore supporting missing_bucket should be prioritized.

Update

If transform can not apply any change detection a warning (job message + log) is raised regarding performance.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-05-29T06:51:56Z

Pinging @elastic/ml-core (:ml/Transform)

fixes elastic#57332

) disable optimizations when using scripts in group_by, when scripts using scripts we can not predict the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. fixes #57332

leftover from a previous solution of elastic#57332, which has been discarded relates elastic#60724 fixes elastic#60794

remove test, scripts are excluded in the change collector, the test is a leftover from a previous solution of #57332, which has been discarded relates #60724 fixes #60794

hendrikmuhs · 2020-09-02T12:58:36Z

FWIW: I tested the possibility to backport this change to 7.9, but unfortunately this turned into a lot of merge problems as the PR depends on other changes.

Without leaking any information about release dates, the next patch release 7.9.2 and the next minor 7.10 are not far away and there is not much time in-between them.

I therefore dropped the idea of a backport.

…62524) the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. relates #57332 backport #60724

hendrikmuhs · 2020-09-17T10:01:21Z

After revisiting the backport again, I manually backported the important bits to 7.9.2.

hendrikmuhs added >bug team-discuss :ml/Transform Transform labels May 29, 2020

hendrikmuhs self-assigned this Jul 29, 2020

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Aug 5, 2020

disable optimizations when using scripts

bd53104

fixes elastic#57332

hendrikmuhs removed the team-discuss label Aug 5, 2020

hendrikmuhs mentioned this issue Aug 5, 2020

[Transform] disable optimizations when using scripts in group_by #60724

Merged

hendrikmuhs closed this as completed in #60724 Aug 5, 2020

hendrikmuhs mentioned this issue Aug 6, 2020

[Transform] remove wrong test #60807

Merged

hendrikmuhs pushed a commit that referenced this issue Aug 6, 2020

[Transform] remove wrong test (#60807)

596ea05

remove test, scripts are excluded in the change collector, the test is a leftover from a previous solution of #57332, which has been discarded relates #60724 fixes #60794

hendrikmuhs pushed a commit that referenced this issue Aug 6, 2020

[Transform] remove wrong test (#60807)

b210aaf

remove test, scripts are excluded in the change collector, the test is a leftover from a previous solution of #57332, which has been discarded relates #60724 fixes #60794

hendrikmuhs mentioned this issue Sep 17, 2020

[7.9][Transform]disable optimizations when using scripts in group_by #62524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Transform] scripted group_by fails for continuos transform #57332

[Transform] scripted group_by fails for continuos transform #57332

hendrikmuhs commented May 29, 2020 •

edited

Loading

elasticmachine commented May 29, 2020

hendrikmuhs commented Sep 2, 2020

hendrikmuhs commented Sep 17, 2020

[Transform] scripted group_by fails for continuos transform #57332

[Transform] scripted group_by fails for continuos transform #57332

Comments

hendrikmuhs commented May 29, 2020 • edited Loading

Possible solutions

B Disable change detection for scripted group_by

General solution remarks

elasticmachine commented May 29, 2020

hendrikmuhs commented Sep 2, 2020

hendrikmuhs commented Sep 17, 2020

hendrikmuhs commented May 29, 2020 •

edited

Loading