-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fix calendar and filter updates from non-master nodes #31804
Merged
dimitris-athanasiou
merged 2 commits into
elastic:master
from
dimitris-athanasiou:fix-calendar-and-filter-updates-on-process
Jul 5, 2018
Merged
[ML] Fix calendar and filter updates from non-master nodes #31804
dimitris-athanasiou
merged 2 commits into
elastic:master
from
dimitris-athanasiou:fix-calendar-and-filter-updates-on-process
Jul 5, 2018
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dimitris-athanasiou
added
>bug
review
v7.0.0
:ml
Machine learning
v6.4.0
v6.5.0
v6.3.2
labels
Jul 4, 2018
Pinging @elastic/ml-core |
droberts195
approved these changes
Jul 5, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes elastic#31803
dimitris-athanasiou
force-pushed
the
fix-calendar-and-filter-updates-on-process
branch
from
July 5, 2018 09:53
699b4f3
to
facc7a8
Compare
dimitris-athanasiou
changed the title
[ML] Fix calendar and filter updates on process
[ML] Fix calendar and filter updates from non-master nodes
Jul 5, 2018
dimitris-athanasiou
deleted the
fix-calendar-and-filter-updates-on-process
branch
July 5, 2018 12:14
dimitris-athanasiou
added a commit
that referenced
this pull request
Jul 5, 2018
Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes #31803
dimitris-athanasiou
added a commit
that referenced
this pull request
Jul 5, 2018
Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes #31803
dnhatn
added a commit
that referenced
this pull request
Jul 5, 2018
* 6.x: Test: Do not remove xpack templates when cleaning (#31642) SQL: Allow long literals (#31777) SQL: Fix incorrect message for aliases (#31792) Detach Transport from TransportService (#31727) 6.3.1 release notes (#31829) Add unreleased version 6.3.2 [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817) [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800) [ML] Fix calendar and filter updates from non-master nodes (#31804) Fix license header generation on Windows (#31790) mark XPackRestIT.test {p0=monitoring/bulk/10_basic/Bulk indexing of monitoring data} as AwaitsFix Add JDK11 support without enabling in CI (#31644) Watcher: Fix check for currently executed watches (#31137) [DOCS] Fixes 6.3.0 release notes (#31771) Watcher: Ensure correct method is used to read secure settings (#31753) [ML] Rate limit established model memory updates (#31768) SQL: Update CLI logo
dnhatn
added a commit
that referenced
this pull request
Jul 5, 2018
* master: REST high-level client: add get index API (#31703) SQL: Allow long literals (#31777) SQL: Fix incorrect message for aliases (#31792) Test: Do not remove xpack templates when cleaning (#31642) Reduce more raw types warnings (#31780) Add unreleased version 6.3.2 Scripting: Remove support for deprecated StoredScript contexts (#31394) [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817) [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800) [ML] Fix calendar and filter updates from non-master nodes (#31804) Fix license header generation on Windows (#31790) mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix mark SearchAsyncActionTests.testFanOutAndCollect as AwaitsFix Correct exclusion of test on JDK 11 Fix doclint jdk 11 Add JDK11 support and enable in CI (#31644) Watcher: Fix check for currently executed watches (#31137) Watcher: Ensure correct method is used to read secure settings (#31753) SQL: Update CLI logo
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.
However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.
This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).
This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.
This leaves us with 2 types of updates:
updates to the job config should happen on the master
node. This is because we cannot refetch the entire job
and update it. We need to know the parts that have been changed.
updates to resources the job uses. Those can be handled
on non-master nodes but they should be re-fetched by the
update process action.
Closes #31803