Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling service-discovery driven shutdown of matching engine #6198
Enabling service-discovery driven shutdown of matching engine #6198
Changes from 11 commits
58eb90c
a8240d0
6e9c59b
c974255
6acc273
4ff1e16
9c048eb
165b111
f3efac6
b1b77c3
bc10553
7d4e592
3236cc8
84098f3
c21d56a
f695c38
4d870b1
9adba98
8e22562
2eeaca5
40f716d
2f74aec
c36df6a
15ec5ca
25e649b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not waiting for
subscribeToMembershipChanges
to complete, possibly goroutine leak?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true: I was too lazy and didn't think to add a full WG setup to engine. What do you think? It does mean that I didn't toggle it on for the leak-detector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not leaving goroutines behind. let's wait here until
subscribeToMembershipChanges
returns (via waitgroup)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't love adding a bunch of complexity back to the shutdown process, but fair. Added a waitgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preemptively stopping tasklists based on service-discovery result might cause "freeze" on a tasklist whose new owner didn't claim it yet. Ideally the old host performs operation on it until the new owner host claims the lease. Is that something we can easily check from DB? (I guess not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the duration of that wait I would expect to be the startup time of a shard + the latency of service-discovery propogation (interally for us, 2 seconds). I do think that it's a slight risk, but thats' not super far off the duration of a LWT, so I'm not sure doing a CAS operation to poll or something would be worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change should perform better than what we have right now but the ideal implementation should also consider that new owner is active. We can try that as a follow up. Do you have plans to make the simulation framework support ring change scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have plans to so personally, I need to move on to a different project, but I do think that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which lock is being held here? If this comment belongs to a previous iteration then let's just stop the tasklists synchronously. Better to not return from
shutDownNonOwnedTasklists
while some background goroutines are still stopping task lists.shutDownNonOwnedTasklists
may be called again via ticker and might try to do the same/overlapping work and cause unexpected problems.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does refer to an earlier iteration where it was locking, so let me fix.
My concern with doing it serially is that the net time to shut down will be quite slow, the
tlmgr.Close()
method does a lot of IO. What about throwing in a waitgroup and doing it pararallel?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing parallel should be fine I think. just avoid leaving them behind before returning