Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running tasks not working with REST Clients (CURL, Postman, etc) #742

Closed
panbalag opened this issue Aug 1, 2019 · 3 comments
Closed
Milestone

Comments

@panbalag
Copy link

panbalag commented Aug 1, 2019

When making REST calls to models deployed with extended timeout, the call fails. The model itself takes 3 to 4 minutes to complete. In case of curl, error ["curl: (52) Empty reply from server"] is reported 60 seconds after the curl request is issued. Curl command was supplied with parameters --connect-timeout 10000 --keepalive-time 10000 --max-time 10000 to avoid any timeouts from curl. No errors seen from 'seldon-container-engine' or 'seldon-controller-manager'. So it is difficult to triage what is causing this error. We tried POSTMAN - a different REST client and saw the error "Could not get any response".

SETUP
Openshift 4.1
Seldon : seldonio/seldon-core-operator:0.3.2-SNAPSHOT

MODEL YAML
{
"apiVersion": "machinelearning.seldon.io/v1alpha2",
"kind": "SeldonDeployment",
"metadata": {
"labels": {
"app": "seldon"
},
"name": "m-detectanomaly"
},
"spec": {
"annotations": {
"project_name": "p-detectanomaly",
"deployment_version": "0.1",
"seldon.io/rest-read-timeout":"1000000",
"seldon.io/rest-connection-timeout":"1000000",
"seldon.io/grpc-read-timeout":"1000000"
},
"name": "detectanomaly",
"oauth_key": "detectanomaly_key",
"oauth_secret": "detectanomaly_secret",
"predictors": [
{
"componentSpecs": [{
"spec": {
"containers": [
{
"image": "docker.io/panbalag/anomaly_detection",
"imagePullPolicy": "Always",
"name": "c-detectanomaly",
"resources": {
"requests": {
"memory": "10Mi"
}
}
}
],
"terminationGracePeriodSeconds": 20
}
}],
"graph": {
"children": [],
"name": "c-detectanomaly",
"endpoint": {
"type" : "REST"
},
"type": "MODEL"
},
"name": "predictor",
"replicas": 1,
"annotations": {
"predictor_version" : "0.1"
}
}
]
}
}

CURL OUTPUT

[panbalag@panbalag anomaly_detection]$ curl -v --connect-timeout 10000 --keepalive-time 10000 --max-time 10000 -v http://detect-anomaly-route-ai-library.apps.cluster-raleigh-ce9c.raleigh-ce9c.openshiftworkshop.com/api/v0.1/predictions -d '{"strData":"s3endpointUrl= <>, s3accessKey=<>, s3secretKey=<>, s3objectStoreLocation=DH-DEV-DATA, s3Path=aiops, s3Destination=aiops/, inputdata=meminfo.csv"}' -H "Content-Type: application/json"

  • About to connect() to detect-anomaly-route-ai-library.apps.cluster-raleigh-ce9c.raleigh-ce9c.openshiftworkshop.com port 80 (#0)
  • Trying 34.231.135.39...
  • Connected to detect-anomaly-route-ai-library.apps.cluster-raleigh-ce9c.raleigh-ce9c.openshiftworkshop.com (34.231.135.39) port 80 (#0)

POST /api/v0.1/predictions HTTP/1.1
User-Agent: curl/7.29.0
Host: detect-anomaly-route-ai-library.apps.cluster-raleigh-ce9c.raleigh-ce9c.openshiftworkshop.com
Accept: /
Content-Type: application/json
Content-Length: 235

  • upload completely sent off: 235 out of 235 bytes
  • Empty reply from server
  • Connection #0 to host detect-anomaly-route-ai-library.apps.cluster-raleigh-ce9c.raleigh-ce9c.openshiftworkshop.com left intact
    curl: (52) Empty reply from server

LOGS - SELDON-CONTROLLER -MANAGER

{"level":"info","ts":1564670635.5284324,"logger":"seldon-controller","msg":"Found identical Service","namespace":"ai-library","name":"seldon-90b32c6bab041bb14e8a4a4e3c216575","status":{"loadBalancer":{}}}
{"level":"info","ts":1564670635.528453,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}

LOGS-SELDON-CONTAINER-ENGINE

oc log -f detectanomaly-predictor-8025dc3-6964596b94-7bnl6 -c seldon-container-engine
W0801 11:44:42.721278 25150 cmd.go:358] log is DEPRECATED and will be removed in a future version. Use logs instead.

. ____ _ __ _ _
/\ / ' __ _ () __ __ _ \ \ \
( ( )_
_ | '_ | '| | ' / ` | \ \ \
\/ )| |)| | | | | || (| | ) ) ) )
' |
| .__|| ||| |_, | / / / /
=========|
|==============|/=////
:: Spring Boot :: (v1.5.17.RELEASE)

2019-08-01 14:43:22.933 INFO 1 --- [ main] io.seldon.engine.App : Starting App v0.3.1 on detectanomaly-predictor-8025dc3-6964596b94-7bnl6 with PID 1 (/app.jar started by 1000490000 in /)
2019-08-01 14:43:22.939 INFO 1 --- [ main] io.seldon.engine.App : No active profile set, falling back to default profiles: default
2019-08-01 14:43:34.935 INFO 1 --- [ main] i.s.engine.config.CustomizationBean : Customizing EmbeddedServlet
2019-08-01 14:43:34.935 INFO 1 --- [ main] i.s.engine.config.CustomizationBean : FOUND env var [ENGINE_SERVER_PORT], will use for engine server port
2019-08-01 14:43:34.936 INFO 1 --- [ main] i.s.engine.config.CustomizationBean : setting serverPort[8000]
2019-08-01 14:43:45.841 INFO 1 --- [ main] i.s.engine.predictors.EnginePredictor : init
2019-08-01 14:43:45.841 INFO 1 --- [ main] i.s.engine.predictors.EnginePredictor : FOUND env var [ENGINE_PREDICTOR], will use for engine predictor
2019-08-01 14:43:47.361 INFO 1 --- [ main] i.s.engine.predictors.EnginePredictor : Setting deployment name to detectanomaly
2019-08-01 14:43:47.464 INFO 1 --- [ main] i.s.engine.predictors.EnginePredictor : Installed engine predictor: {"name":"predictor","graph":{"name":"c-detectanomaly","children":[],"type":"MODEL","implementation":"UNKNOWN_IMPLEMENTATION","methods":[],"endpoint":{"service_host":"localhost","service_port":9000,"type":"REST"},"parameters":[]},"componentSpecs":[{"metadata":{"name":"","generateName":"","namespace":"","selfLink":"","uid":"","resourceVersion":"","generation":0,"deletionGracePeriodSeconds":0,"labels":{},"annotations":{},"ownerReferences":[],"finalizers":[],"clusterName":""},"spec":{"volumes":[{"name":"podinfo"}],"containers":[{"name":"c-detectanomaly","image":"docker.io/panbalag/anomaly_detection","command":[],"args":[],"workingDir":"","ports":[{"name":"http","hostPort":0,"containerPort":9000,"protocol":"TCP","hostIP":""}],"env":[{"name":"PREDICTIVE_UNIT_SERVICE_PORT","value":"9000"},{"name":"PREDICTIVE_UNIT_ID","value":"c-detectanomaly"},{"name":"PREDICTOR_ID","value":"predictor"},{"name":"SELDON_DEPLOYMENT_ID","value":"m-detectanomaly"}],"resources":{"limits":{},"requests":{"memory":{"string":"10Mi"}}},"volumeMounts":[{"name":"podinfo","readOnly":false,"mountPath":"/etc/podinfo","subPath":""}],"livenessProbe":{"initialDelaySeconds":60,"timeoutSeconds":1,"periodSeconds":5,"successThreshold":1,"failureThreshold":3},"readinessProbe":{"initialDelaySeconds":20,"timeoutSeconds":1,"periodSeconds":5,"successThreshold":1,"failureThreshold":3},"lifecycle":{"preStop":{"exec":{"command":["/bin/sh","-c","/bin/sleep 10"]}}},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"Always","stdin":false,"stdinOnce":false,"tty":false,"envFrom":[],"terminationMessagePolicy":"File"}],"restartPolicy":"Always","terminationGracePeriodSeconds":20,"activeDeadlineSeconds":0,"dnsPolicy":"ClusterFirst","nodeSelector":{},"serviceAccountName":"","serviceAccount":"","nodeName":"","hostNetwork":false,"hostPID":false,"hostIPC":false,"securityContext":{"runAsUser":0,"runAsNonRoot":false,"supplementalGroups":[],"fsGroup":0},"imagePullSecrets":[],"hostname":"","subdomain":"","schedulerName":"default-scheduler","initContainers":[],"automountServiceAccountToken":false,"tolerations":[],"hostAliases":[],"priorityClassName":"","priority":0}}],"replicas":1,"annotations":{"predictor_version":"0.1"},"engineResources":{"limits":{},"requests":{}},"labels":{"version":"predictor"},"svcOrchSpec":{"env":[]},"traffic":0}
2019-08-01 14:43:47.544 INFO 1 --- [ main] i.s.engine.config.AnnotationsConfig : Annotations {prometheus.io/path=prometheus, seldon.io/rest-read-timeout=1000000, openshift.io/scc=restricted, project_name=p-detectanomaly, deployment_version=0.1, k8s.v1.cni.cncf.io/networks-status=[{\n "name": "openshift-sdn",\n "interface": "eth0",\n "ips": [\n "10.131.0.36"\n ],\n "default": true,\n "dns": {}\n}], seldon.io/grpc-read-timeout=1000000, kubernetes.io/config.source=api, 0.1=, kubernetes.io/config.seen=2019-08-01T14:43:09.822288163Z, kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container c-detectanomaly; cpu, memory limit for container c-detectanomaly; memory request for container seldon-container-engine; cpu, memory limit for container seldon-container-engine, prometheus.io/port=8000, prometheus.io/scrape=true, seldon.io/rest-connection-timeout=1000000}
2019-08-01 14:43:47.642 INFO 1 --- [ main] i.seldon.engine.tracing.TracingProvider : Not activating tracing
2019-08-01 14:43:47.644 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : Setting REST connection timeout from annotation seldon.io/rest-connection-timeout
2019-08-01 14:43:47.644 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : REST Connection timeout set to 1000000
2019-08-01 14:43:47.644 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : Setting REST read timeout from annotation seldon.io/rest-read-timeout
2019-08-01 14:43:47.644 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : REST read timeout set to 1000000
2019-08-01 14:43:48.948 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : gRPC max message size set to 4194304
2019-08-01 14:43:48.948 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : Setting grpc read timeout to 1000000ms
2019-08-01 14:43:48.948 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : gRPC read timeout set to 1000000
2019-08-01 14:43:48.948 INFO 1 --- [ main] i.s.e.service.InternalPredictionService : REST retries set to 3
2019-08-01 14:43:49.347 INFO 1 --- [ main] io.seldon.engine.grpc.SeldonGrpcServer : FOUND env var [ENGINE_SERVER_GRPC_PORT], will use engine server port 5001
2019-08-01 14:43:50.050 INFO 1 --- [cTaskExecutor-1] io.seldon.engine.grpc.SeldonGrpcServer : Starting grpc server
2019-08-01 14:43:51.637 INFO 1 --- [cTaskExecutor-1] io.seldon.engine.grpc.SeldonGrpcServer : Server started, listening on 5001
2019-08-01 14:43:53.836 INFO 1 --- [ main] io.seldon.engine.App : Started App in 33.294 seconds (JVM running for 36.201)
2019-08-01 15:14:23.879 WARN 1 --- [nio-8000-exec-8] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.web.HttpRequestMethodNotSupportedException: Request method 'GET' not supported]
2019-08-01 15:14:24.931 WARN 1 --- [io-8000-exec-10] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.web.HttpRequestMethodNotSupportedException: Request method 'GET' not supported]
2019-08-01 15:14:24.961 WARN 1 --- [nio-8000-exec-2] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.web.HttpRequestMethodNotSupportedException: Request method 'GET' not supported]

@ukclivecox
Copy link
Contributor

Can you check solution for #753 @panbalag

@ukclivecox
Copy link
Contributor

Can we close this?

@ukclivecox ukclivecox added this to the 1.0.x milestone Aug 24, 2019
@ukclivecox
Copy link
Contributor

Closing as assumed fixed. Please reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants