Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add startupProbe and replace readiness probe with liveness probe #5407

Merged
merged 52 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
32dcbc2
fix: run health servicer in thread to allow readinessProbe while req
JoanFM Nov 17, 2022
89d9645
test: add test with slow process
JoanFM Nov 17, 2022
56e1d53
Merge branch 'master' into health_servicer_thread
JoanFM Nov 18, 2022
3a6bc63
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
dbbea06
Merge branch 'health_servicer_thread' of github.com:jina-ai/jina into…
Nov 18, 2022
c974bad
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
04bb57d
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
bbc25ae
fix: use first port of gatway.port argument for target address
Nov 18, 2022
308a3fa
test: add test proving readinessProbe can pass while processing
JoanFM Nov 21, 2022
cfa50ad
feat: add livenessProbe
JoanFM Nov 21, 2022
fd10e2f
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 21, 2022
9974167
fix: tcpProbe as livenessProbe
JoanFM Nov 21, 2022
a779cf9
feat: run blocking endpoint in thread proteced by lock
JoanFM Nov 21, 2022
0d7d75f
Merge branch 'master' into health_servicer_thread
JoanFM Nov 21, 2022
26d0142
fix: avoid error no eventloop outside MainThread
JoanFM Nov 21, 2022
ce699d6
fix: livenessProbe delayed
JoanFM Nov 21, 2022
ac9ea98
Merge branch 'master' into health_servicer_thread
JoanFM Nov 22, 2022
db45ba7
fix: fix add timeout to readiness
JoanFM Nov 22, 2022
988d03b
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 22, 2022
2ef8881
test: remove unneeded test
JoanFM Nov 22, 2022
29b2213
fix: remove the timeout from checker now
JoanFM Nov 22, 2022
d12a659
test: change test k8s failures
JoanFM Nov 22, 2022
470ab20
fix: try to downgrade grpcio
JoanFM Nov 23, 2022
22fe668
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 23, 2022
f014035
style: fix overload and cli autocomplete
jina-bot Nov 23, 2022
92ea0d5
fix: change readinessProbe for startupProbe
JoanFM Nov 23, 2022
3c7a4cc
fix: change the startupProbe values
JoanFM Nov 23, 2022
6122f04
test: try to see how many ids are sent and responded
JoanFM Nov 23, 2022
28bd253
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 23, 2022
a53cc8b
ci: fix reqs
JoanFM Nov 23, 2022
02bc738
style: fix overload and cli autocomplete
jina-bot Nov 23, 2022
bd2b8b0
test: try to see what happens with `continue_on_error
JoanFM Nov 23, 2022
eb3c157
Merge branch 'master' into health_servicer_thread
JoanFM Nov 24, 2022
c72546b
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 24, 2022
dd61349
ci: some changes in k8s tests
JoanFM Nov 24, 2022
e73f1f1
refactor: set SERVING after start
JoanFM Nov 24, 2022
b4b8bb6
Merge branch 'ci-k8s' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 25, 2022
42eef99
refactor: add prestop hook
JoanFM Nov 25, 2022
04f8fcc
Merge branch 'ci-k8s' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 25, 2022
f258607
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 1, 2022
1a38453
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 1, 2022
f03ac42
Merge branch 'master' into health_servicer_thread
girishc13 Dec 2, 2022
16cd69b
Merge branch 'master' into health_servicer_thread
girishc13 Dec 6, 2022
6d8f52d
feat: retry on grpc UNKNOWN and INTERNAL error codes
Dec 7, 2022
5356ed5
ci: remove excess debug logs
Dec 7, 2022
4d4a05a
ci: replace pod portforward with service portforward
Dec 7, 2022
0820501
test: add perf tools to docker image
Dec 7, 2022
cd51bf1
Revert "ci: replace pod portforward with service portforward"
Dec 7, 2022
60abe21
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 7, 2022
681b081
ci: use default sleep time between requests
Dec 7, 2022
0740761
ci: remove GRPC debug flags
Dec 7, 2022
b2437d8
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions jina/checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ def __init__(self, args: 'argparse.Namespace'):
) as tc:
if args.target == 'executor':
hostname, port, protocol, _ = parse_host_scheme(args.host)
r = WorkerRuntime.is_ready(f'{hostname}:{port}')
r = WorkerRuntime.is_ready(ctrl_address=f'{hostname}:{port}')
elif args.target == 'gateway':
hostname, port, protocol, _ = parse_host_scheme(args.host)
r = GatewayRuntime.is_ready(
f'{hostname}:{port}',
protocol=GatewayProtocolType.from_string(protocol),
protocol=GatewayProtocolType.from_string(protocol)
)
elif args.target == 'flow':
r = Client(host=args.host).is_flow_ready(timeout=args.timeout)
r = Client(host=args.host).is_flow_ready(timeout=args.timeout / 1000)
if not r:
default_logger.warning(
'not responding, attempt (%d/%d) in 1s'
Expand Down
4 changes: 3 additions & 1 deletion jina/clients/base/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,9 @@ async def send_dry_run(self, **kwargs):
:param kwargs: keyword arguments to make sure compatible API with other clients
:return: send get message
"""
return await self.session.get(url=self.url).__aenter__()
return await self.session.get(
url=self.url, timeout=kwargs.get('timeout', None)
).__aenter__()

async def recv_message(self):
"""Receive message for HTTP (sleep)
Expand Down
14 changes: 12 additions & 2 deletions jina/resources/k8s/template/deployment-executor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
14 changes: 12 additions & 2 deletions jina/resources/k8s/template/deployment-gateway.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- gateway
- {protocol}://127.0.0.1:{port}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- gateway
- {protocol}://127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
39 changes: 31 additions & 8 deletions jina/resources/k8s/template/deployment-uses-after.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,26 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name

readinessProbe:
tcpSocket:
port: {port}
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 5
periodSeconds: 10
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
exec:
Expand Down Expand Up @@ -85,16 +99,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name

readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_after}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_after}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
53 changes: 43 additions & 10 deletions jina/resources/k8s/template/deployment-uses-before-after.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,26 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name

readinessProbe:
tcpSocket:
port: {port}
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses}
initialDelaySeconds: 5
periodSeconds: 10
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
exec:
Expand Down Expand Up @@ -85,16 +99,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name

readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_before}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_before}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down Expand Up @@ -125,15 +148,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_after}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_after}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
38 changes: 31 additions & 7 deletions jina/resources/k8s/template/deployment-uses-before.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,26 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
tcpSocket:
port: {port}
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 5
periodSeconds: 10
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
exec:
Expand Down Expand Up @@ -84,16 +99,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name

readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_before}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port_uses_before}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
14 changes: 12 additions & 2 deletions jina/resources/k8s/template/statefulset-executor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,25 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 5
periodSeconds: 20
periodSeconds: 5
timeoutSeconds: 10
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:{port}
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
lifecycle:
preStop:
Expand Down
2 changes: 2 additions & 0 deletions jina/serve/networking.py
Original file line number Diff line number Diff line change
Expand Up @@ -975,6 +975,8 @@ async def _handle_aiorpcerror(
error.code() != grpc.StatusCode.UNAVAILABLE
and error.code() != grpc.StatusCode.CANCELLED
and error.code() != grpc.StatusCode.DEADLINE_EXCEEDED
and error.code() != grpc.StatusCode.UNKNOWN
and error.code() != grpc.StatusCode.INTERNAL
):
return error
elif (
Expand Down
21 changes: 6 additions & 15 deletions jina/serve/runtimes/asyncio.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,34 +161,25 @@ async def async_run_forever(self):
# Static methods used by the Pod to communicate with the `Runtime` in the separate process

@staticmethod
def activate(**kwargs):
"""
Activate the runtime, does not apply to these runtimes

:param kwargs: extra keyword arguments
"""
# does not apply to this types of runtimes
pass

@staticmethod
def is_ready(ctrl_address: str, **kwargs) -> bool:
def is_ready(ctrl_address: str, timeout: float = 1.0, **kwargs) -> bool:
"""
Check if status is ready.

:param ctrl_address: the address where the control request needs to be sent
:param timeout: timeout of the health check in seconds
:param kwargs: extra keyword arguments

:return: True if status is ready else False.
"""

try:
from grpc_health.v1 import health_pb2, health_pb2_grpc

response = GrpcConnectionPool.send_health_check_sync(
ctrl_address, timeout=1.0
ctrl_address, timeout=timeout
)
return (
response.status == health_pb2.HealthCheckResponse.ServingStatus.SERVING
)
# TODO: Get the proper value of the ServingStatus SERVING KEY
return response.status == 1
except RpcError:
return False

Expand Down
Loading