Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java Engine and Go Executor Does not Terminate Graph upon Error #2480

Closed
mwm5945 opened this issue Sep 23, 2020 — with Board Genius Sync · 4 comments
Closed

Java Engine and Go Executor Does not Terminate Graph upon Error #2480

mwm5945 opened this issue Sep 23, 2020 — with Board Genius Sync · 4 comments
Labels
Milestone

Comments

Copy link
Contributor

mwm5945 commented Sep 23, 2020

Describe the bug

We have a graph that is has two models that are children of a combiner. The models are H2O, using the 0.2.0 version of the java wrapper, and the combiner is using seldon 1.2.3. All communication is done through GRPC. The Engine is the old java engine. One of the models encounters an error, and instead of terminating the execution of the graph, the error response is sent to the combiner, which expects valid data but can't be found in the message, and it returns an error.

H2O model Response (Via grpcurl):

{
  "status": {
    "code": 500,
    "reason": "internal model error: feature request at index 0 failed: Unexpected object type java.lang.Double for [REDACTED]",
    "status": "FAILURE"
  }
}

Combiner Logs/Response object:

 2020-09-23 16:50:13,425 - grpc._server:_call_behavior:445 - ERROR:  Exception calling application:
 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/dist-packages/grpc/_server.py", line 435, in _call_behavior
     response_or_iterator = behavior(argument, context)
   File "/usr/local/lib/python3.7/dist-packages/seldon_core/wrapper.py", line 226, in Aggregate
     self.user_model, request_grpc, self.seldon_metrics
   File "/usr/local/lib/python3.7/dist-packages/seldon_core/seldon_methods.py", line 433, in aggregate
     (features, meta, datadef, data_type) = extract_request_parts(msg)
   File "/usr/local/lib/python3.7/dist-packages/seldon_core/utils.py", line 621, in extract_request_parts
     features = get_data_from_proto(request)
   File "/usr/local/lib/python3.7/dist-packages/seldon_core/utils.py", line 186, in get_data_from_proto
     raise SeldonMicroserviceException("Unknown data in SeldonMessage")
 seldon_core.flask_utils.SeldonMicroserviceException
rpc error: code = Unknown desc = Exception calling application: 

To reproduce

Build SDEP with two h2o models (using 0.2.0 of java wrapper), and a python combiner (using 1.23), with the java engine. Have one (or both) models return an error.

Expected behaviour

Execution of the graph is terminated when an error occurs in any node of the graph.

Environment

  • Cloud Provider: AWS
  • Kubernetes Cluster Version: 1.17
  • Seldon Version: 1.2.3

Model Details

  • Images of your model: (custom images)
  • Logs of your model: See Above
@mwm5945 mwm5945 added bug triage Needs to be triaged and prioritised accordingly labels Sep 23, 2020
@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Oct 1, 2020
@mwm5945 mwm5945 changed the title Java Engine Does not Terminate Graph upon Error Java Engine and Go Executor Does not Terminate Graph upon Error Oct 2, 2020
@mwm5945
Copy link
Contributor Author

mwm5945 commented Oct 2, 2020

I've updated the title of the issue as we're still experiencing the issue using the executor. I'll add more information/logs in another comment below.

@mwm5945
Copy link
Contributor Author

mwm5945 commented Oct 2, 2020

Executor Logs:

$ k logs [pod] --container seldon-container-engine
{"level":"info","ts":1601646726.0333207,"logger":"entrypoint","msg":"Hostname unset will use localhost"}
{"level":"error","ts":1601646726.0357838,"logger":"entrypoint","msg":"Failed to embed variables on OpenAPI template","error":"open ./openapi/seldon.json: permission denied","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/app/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\nmain.main\n\t/app/go/src/seldon-core/executor/cmd/executor/main.go:280\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
{"level":"info","ts":1601646726.0358837,"logger":"entrypoint","msg":"Starting","worker":1}
{"level":"info","ts":1601646726.0359118,"logger":"entrypoint","msg":"Starting","worker":2}
{"level":"info","ts":1601646726.035919,"logger":"entrypoint","msg":"Starting","worker":3}
{"level":"info","ts":1601646726.035923,"logger":"entrypoint","msg":"Starting","worker":4}
{"level":"info","ts":1601646726.0359266,"logger":"entrypoint","msg":"Starting","worker":5}
{"level":"info","ts":1601646726.0365126,"logger":"entrypoint","msg":"Running http probes only server ","port":8000}
{"level":"info","ts":1601646726.0365255,"logger":"entrypoint","msg":"Creating non-TLS listener","port":8000}
{"level":"info","ts":1601646726.0366552,"logger":"entrypoint","msg":"Running grpc server ","port":5001}
{"level":"info","ts":1601646726.0366704,"logger":"entrypoint","msg":"Creating non-TLS listener","port":5001}
{"level":"info","ts":1601646726.0366883,"logger":"entrypoint","msg":"Setting max message size ","size":5100000}
{"level":"info","ts":1601646726.036823,"logger":"SeldonRestApi","msg":"Listening","Address":"0.0.0.0:8000"}
{"level":"error","ts":1601647248.4919252,"logger":"SeldonGrpcApi","msg":"Failed to call predict","error":"rpc error: code = Unknown desc = Exception calling application: ","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/app/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngithub.com/seldonio/seldon-core/executor/api/grpc/seldon.GrpcSeldonServer.Predict\n\t/app/go/src/seldon-core/executor/api/grpc/seldon/server.go:42\ngithub.com/seldonio/seldon-core/executor/api/grpc/seldon/proto._Seldon_Predict_Handler.func1\n\t/app/go/src/seldon-core/executor/api/grpc/seldon/proto/prediction.pb.go:1881\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/app/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.1/tracing/opentracing/server_interceptors.go:34\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/app/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.1/chain.go:25\ngithub.com/seldonio/seldon-core/executor/api/metric.(*ServerMetrics).UnaryServerInterceptor.func1\n\t/app/go/src/seldon-core/executor/api/metric/server.go:53\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/app/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.1/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n\t/app/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.1/chain.go:34\ngithub.com/seldonio/seldon-core/executor/api/grpc/seldon/proto._Seldon_Predict_Handler\n\t/app/go/src/seldon-core/executor/api/grpc/seldon/proto/prediction.pb.go:1883\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/app/go/pkg/mod/google.golang.org/grpc@v1.32.0/server.go:1194\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/app/go/pkg/mod/google.golang.org/grpc@v1.32.0/server.go:1517\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/app/go/pkg/mod/google.golang.org/grpc@v1.32.0/server.go:859"}

Combiner has the same logs.

We have this in our H2O wrappers:

} catch (PredictException e) {
logger.info("Error in prediction: {} ",e.getMessage());

SeldonMessage resp =
		SeldonMessage.newBuilder().setStatus(
				Status.newBuilder().setStatus(
						Status.StatusFlag.FAILURE).setCode(500).setReason(
						String.format("internal model error: feature request at index %d failed: %s",i,e.getMessage()))
						.build())
				.build();

sw.stop();
logger.info("prediction failed in {}ms", sw.getTotalTimeMillis());
return resp;

Does the executor check the returned seldon message for a non-success status? I.e. it may look like a 200 response for REST, or no error is returned for GRPC, but the actual message is an error message.

@ukclivecox
Copy link
Contributor

For REST it would be:

if response.StatusCode != http.StatusOK {
smc.Log.Info("httpPost failed", "response code", response.StatusCode)
err = &httpStatusError{StatusCode: response.StatusCode, Url: url}
}

@axsaucedo axsaucedo added this to the 1.5 milestone Oct 15, 2020
@ukclivecox ukclivecox modified the milestones: 1.5, 1.6 Nov 30, 2020
@axsaucedo axsaucedo changed the title Java Engine and Go Executor Does not Terminate Graph upon Error OSS-129: Java Engine and Go Executor Does not Terminate Graph upon Error Apr 26, 2021
@axsaucedo axsaucedo changed the title OSS-129: Java Engine and Go Executor Does not Terminate Graph upon Error Java Engine and Go Executor Does not Terminate Graph upon Error Apr 28, 2021
@mwm5945
Copy link
Contributor Author

mwm5945 commented Oct 27, 2021

fixed by #3412 and #3473 (For Executor, not Java Engine)

@mwm5945 mwm5945 closed this as completed Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants