You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have seen many CRIs where out-of-proc language workers fail to start or call back to the host within the defined timeout period. We need to identify the root-cause and fix all the potential reasons, however currently this is very difficult as we are dealing with a long tail of various issues from different components and levels. To do this, we need to improve the reliability of host-worker communication and fix potential bugs.
This epic will keep track of issues that could help us improve worker-host reliability:
liliankasem
changed the title
[Epic] Reliability improvements for worker-host communication
Reliability improvements for worker-host communication
Jan 27, 2022
We have seen many CRIs where out-of-proc language workers fail to start or call back to the host within the defined timeout period. We need to identify the root-cause and fix all the potential reasons, however currently this is very difficult as we are dealing with a long tail of various issues from different components and levels. To do this, we need to improve the reliability of host-worker communication and fix potential bugs.
This epic will keep track of issues that could help us improve worker-host reliability:
Protobuf Message Implementations
Logging
Surface Customer Issues
Investigation
The text was updated successfully, but these errors were encountered: