Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add mirror health checking support #800

Merged
merged 3 commits into from
Nov 4, 2022

Conversation

sctb512
Copy link
Contributor

@sctb512 sctb512 commented Oct 18, 2022

Currently, the mirror is set to unavailable if the failed times reach failure_limit. We added mirror health checking, which will recover unavailable mirror server. The failure_limit indicates the failed time at which the mirror is set to unavailable, The health_check_interval indicates the time interval to recover the unavailable mirror and the ping_url is the endpoint for checking mirror server health.

image

This PR is addressing #765

Signed-off-by: Bin Tang tangbin.bin@bytedance.com

@anolis-bot
Copy link
Collaborator

@sctb512 , a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/27397

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

api/src/http.rs Outdated Show resolved Hide resolved
@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/27453

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/29877

@sctb512
Copy link
Contributor Author

sctb512 commented Oct 29, 2022

Related to #821 as well.

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

@sctb512 sctb512 force-pushed the mirror-health-check branch 2 times, most recently from b7cedcd to bed0097 Compare October 29, 2022 15:46
@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/29880

@sctb512 sctb512 force-pushed the mirror-health-check branch 2 times, most recently from e6141e5 to f06200b Compare October 29, 2022 15:50
@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

@changweige
Copy link
Contributor

We also need this feature in stable/v2.1 :-)

// TODO: check mirrors' health
// Check mirrors' health
for mirror in connection.mirrors.iter() {
let conn = connection.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to extract this for loop into a new function as function new is already too long

Copy link
Contributor Author

@sctb512 sctb512 Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I add two new functions: start_proxy_health_thread and start_mirror_health_thread.

Currently, the mirror is set to unavailable if the failed times reach failure_limit.
We added mirror health checking, which will recover unavailable mirror server.
The failure_limit indicates the failed time at which the mirror is set to unavailable.
The health_check_interval indicates the time interval to recover the unavailable mirror.
The ping_url is the endpoint to check mirror server health.

Signed-off-by: Bin Tang <tangbin.bin@bytedance.com>
Forward 401 response to P2P/dragonfly will affect performance.
When there is a mirror that auth_through false, we refresh the token regularly
to avoid forwarding the 401 response to mirror.

Signed-off-by: Bin Tang <tangbin.bin@bytedance.com>
@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/30804

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

Signed-off-by: Bin Tang <tangbin.bin@bytedance.com>
@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/30812

@anolis-bot
Copy link
Collaborator

@sctb512 , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/30813

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

@anolis-bot
Copy link
Collaborator

@sctb512 , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅ SUCCESS
build-docker-image✅ SUCCESS
compile-nydus✅ SUCCESS
compile-ctr-remote✅ SUCCESS
compile-nydus-snapshotter✅ SUCCESS
start-nydus-snapshotter-config-containerd✅ SUCCESS
run-container-with-nydus-image✅ SUCCESS

Congratulations, your test job passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants