Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliability improvements for worker-host communication #8076

Open
9 of 12 tasks
liliankasem opened this issue Jan 19, 2022 · 0 comments
Open
9 of 12 tasks

Reliability improvements for worker-host communication #8076

liliankasem opened this issue Jan 19, 2022 · 0 comments

Comments

@liliankasem
Copy link
Member

liliankasem commented Jan 19, 2022

We have seen many CRIs where out-of-proc language workers fail to start or call back to the host within the defined timeout period. We need to identify the root-cause and fix all the potential reasons, however currently this is very difficult as we are dealing with a long tail of various issues from different components and levels. To do this, we need to improve the reliability of host-worker communication and fix potential bugs.

This epic will keep track of issues that could help us improve worker-host reliability:

Protobuf Message Implementations

Logging

Surface Customer Issues

Investigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant