You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a multi-node environment, when running a continuous transform, the following warning is spammed in the logs occasionally:
[instance-0000000009] [some_transform_id] data frame transform encountered an exception:
java.lang.RuntimeException: Failed to retrieve checkpoint due to Failed to create checkpoint
at org.elasticsearch.xpack.dataframe.transforms.DataFrameTransformTask$ClientDataFrameIndexer.lambda$createCheckpoint$17(DataFrameTransformTask.java:1084) [data-frame-7.4.0.jar:7.4.0]
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-7.4.0.jar:7.4.0]
...
After @hendrikmuhs investigated this, we found out that this is due to a mismatch of global checkpoints for the same shard (replicas). This is by design and it's nothing to worry about but the transform is paranoid and throws an exception. It should be safe to ignore the mismatch and e.g. take the max of all global checkpoints.
As a result, we should remove this message as it is unnecessary.
The text was updated successfully, but these errors were encountered:
…mismatch (#48423)
Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design
fixes#48379
…mismatch (#48423)
Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design
fixes#48379
…mismatch (#48423)
Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design
fixes#48379
Spotted in 7.4.0
In a multi-node environment, when running a continuous transform, the following warning is spammed in the logs occasionally:
After @hendrikmuhs investigated this, we found out that this is due to a mismatch of global checkpoints for the same shard (replicas). This is by design and it's nothing to worry about but the transform is paranoid and throws an exception. It should be safe to ignore the mismatch and e.g. take the max of all global checkpoints.
As a result, we should remove this message as it is unnecessary.
The text was updated successfully, but these errors were encountered: