-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Auto fix gc_worker's service safepoint for upgraded clusters #3371
server: Auto fix gc_worker's service safepoint for upgraded clusters #3371
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@lichunzhu @disksing @rleungx PTAL |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #3371 +/- ##
==========================================
- Coverage 74.95% 74.93% -0.03%
==========================================
Files 243 243
Lines 23291 23304 +13
==========================================
+ Hits 17457 17462 +5
- Misses 4267 4276 +9
+ Partials 1567 1566 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Ping @lichunzhu @disksing @rleungx PTAL |
tests/client/client_test.go
Outdated
c.Assert(err, IsNil) | ||
|
||
// Force set invalid ttl to gc_worker | ||
//err = s.srv.GetStorage().SaveServiceGCSafePoint(&core.ServiceSafePoint{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry. I forgot to remove it.
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
server/core/storage.go
Outdated
// It's a new cluster, or everything is lost so we have no way to recover it. Initialize gc_worker's service | ||
// safepoint to zero. | ||
_, err = s.initServiceGCSafePointForGCWorker(0) | ||
return true, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the initialization is wrong, we still return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I thought err will be not nil here so it doesn't matter 🤔 maybe it doesn't look good. I'll change it.
server/core/storage.go
Outdated
|
||
// gc_worker is missing. | ||
_, err = s.initServiceGCSafePointForGCWorker(min) | ||
return true, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do 1. check whether do we have gc_worker key 2. compute the min safepoint in one loop. After this loop, if 1 is false, we can insert the gc_worker key.
server/core/storage.go
Outdated
// in the older version may have invalid TTL for gc_worker's safepoint, and it also might be missing. gc_worker's | ||
// safepoint may also be missing when the cluster is just bootstrapped. Detect these cases and fix gc_worker's safepoint | ||
// if necessary. | ||
func (s *Storage) fixGCWorkerServiceSafePpoint(allServiceSafePoints []*ServiceSafePoint) (modified bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (s *Storage) fixGCWorkerServiceSafePpoint(allServiceSafePoints []*ServiceSafePoint) (modified bool, err error) { | |
func (s *Storage) fixGCWorkerServiceSafePoint(allServiceSafePoints []*ServiceSafePoint) (modified bool, err error) { |
server/core/storage.go
Outdated
if err := json.Unmarshal([]byte(values[i]), ssp); err != nil { | ||
if modified { | ||
// Reload the safepoints | ||
keys, allServiceSafePoints, err = loadAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is just modifying the gcWorkerServiceSafePointID
key okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's more, it sames that loadAll()
will not affect the result of min
... I think we can remove this logic and modified
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user needs the minimum value among all non-expired safepoints, but I want to fix it with the minimum value including expired-but-not-deleted ones. So the final min value may decrease. It's possible to avoid reloading and do all these things in one loop, but trying to make the code more readable, I choose the less-effective way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@lichunzhu @rleungx PTAL again thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
server/core/storage.go
Outdated
if !hasGCWorker { | ||
// If there exists some service safepoints but gc_worker is missing, init it with the min value among all | ||
// safepoints (including expired ones) | ||
return s.initServiceGCSafePointForGCWorker(min) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return s.initServiceGCSafePointForGCWorker(min) | |
return s.initServiceGCSafePointForGCWorker(validMin.SafePoint) |
server/core/storage.go
Outdated
} | ||
} | ||
|
||
if ssp.SafePoint < min { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ssp is expired, we may get a wrong min
here. I suggest deleting min
and keeping only validMin
.
@lichunzhu: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@lichunzhu updated, PTAL again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@lichunzhu: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by writing |
This PR needs to be cherry-picked to release 4.0. I don't have permission to edit the label. |
/merge |
@rleungx: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 1903929
|
@MyonKeminta: Your PR has out-of-dated, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
cherry pick to release-4.0 in PR #3391 |
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
cherry pick to release-5.0-rc in PR #3392 |
Signed-off-by: MyonKeminta MyonKeminta@users.noreply.github.com
What problem does this PR solve?
Fixes #3366
Though we did a fix in PR #3146 , there's still a problem for clusters upgraded from older version, where the gc_worker's service safepoint may be invalid or missing and there are other service safepoints in the cluster.
What is changed and how it works?
Checks if "gc_worker"'s service safepoint exist and has a infinite TTL every time loading safepoints, and tries to fix it if possible.
Check List
Tests
Code changes
Side effects
Related changes
Release note