Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUB/SUB Error in Subscription Sidecar #692

Closed
a377965 opened this issue Feb 17, 2022 · 16 comments
Closed

PUB/SUB Error in Subscription Sidecar #692

a377965 opened this issue Feb 17, 2022 · 16 comments
Labels
kind/bug Something isn't working P1 size/XS triaged/resolved Items triaged and ready

Comments

@a377965
Copy link

a377965 commented Feb 17, 2022

I am running DAPR in Azure AKS using Azure Service Bus. I have a service which it using grpc and stores state in a Redis instance in AKS. This same client also publishes a topic on an ASB / Namespace topic. The message appears to be sent successfully

A client which is set up to process the message from the subscription on the topic is emitting the following error :

time="2022-02-17T21:24:46.361600968Z" level=debug msg="Waiting to receive message on topic ap555555.calculation.t" app_id=ap149500 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0
time="2022-02-17T21:24:46.361672772Z" level=debug msg="Adding message a2894ee1-944f-4ee9-a5bd-d0c52fe9efa4 to active messages on topic ap555555.calculation.t" app_id=ap149500 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0
time="2022-02-17T21:24:46.361732876Z" level=debug msg="Calling app's handler for message a2894ee1-944f-4ee9-a5bd-d0c52fe9efa4 on topic ap555555.calculation.t" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0
time="2022-02-17T21:24:46.362329914Z" level=error msg="non-retriable error returned from app while processing pub/sub event ccd3f051-0295-4cb7-9e8a-7d9bcd91cbef: {"errorCode":"ERR_DIRECT_INVOKE","message":"failed getting app id either from the URL path or the header dapr-app-id"}. status code returned: 404" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.runtime type=log ver=1.6.0
time="2022-02-17T21:24:46.362365016Z" level=debug msg="Completing message a2894ee1-944f-4ee9-a5bd-d0c52fe9efa4 on topic ap555555.calculation.t" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0
time="2022-02-17T21:24:46.41919448Z" level=debug msg="Removing message a2894ee1-944f-4ee9-a5bd-d0c52fe9efa4 from active messages on topic ap555555.calculation.t" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0
time="2022-02-17T21:25:02.055100447Z" level=debug msg="No active messages require lock renewal for topic ap555555.calculation.t" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0

Any insight into this error, especially based on the setup would be appreciated.

@a377965
Copy link
Author

a377965 commented Feb 17, 2022

I should have also mentioned that both services are using the Java SDK.

@a377965 a377965 closed this as completed Feb 17, 2022
@a377965 a377965 reopened this Feb 17, 2022
@yaron2
Copy link
Member

yaron2 commented Feb 18, 2022

Can you show the code on your event subscriber that is failing to call the invocation endpoint?

It looks like your app is failing to invoke another app, and thus returning an error to Dapr when it's receiving the event.

@a377965
Copy link
Author

a377965 commented Feb 21, 2022

the code is in a private repo, so full sharing is impossible....here are the pertinent code snippets:

k8s Subscription CRD

apiVersion: dapr.io/v1alpha1
kind: Subscription
metadata:
name: ap999999
namespace: pr100000-aks-ns-dev
spec:
topic: ap555555.calculation.t
route: /calculate
pubsubname: calculation-asb
scopes:

  • ap111111
  • ap999999

k8s Component CRD

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: calculation-asb
namespace: pr100000-aks-ns-dev
spec:
type: pubsub.azure.servicebus
version: v1
metadata:

  • name: connectionString
    value: "Endpoint=sb://■■■■■■■■■■■■■■■■■■■/;SharedAccessKeyName=■■■■■■■■■■■■■■■;SharedAccessKey=■■■■■■■■■■■■■■■■■■■"
    scopes:
  • ap111111
  • ap999999

JAVA CODE

@RestController
@slf4j
public class CalculatorMessageService {

@Topic(name = "ap555555.calculation.t", pubsubName = "calculation-asb")
@PostMapping(path = "/calculate")
public Mono<Void> calculate(@RequestBody(required = false) CloudEvent<String> cloudEvent) {
    log.info("getCheckout Invoked");
    return Mono.fromRunnable(() -> {
        try {
            log.info("Subscriber received: " + cloudEvent.getData());
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    });
}

}

k8s Deployment

  annotations:
    dapr.io/enabled: "true"
    dapr.io/app-id: "ap999999"
    dapr.io/app-port: "3500"
    dapr.io/log-level: "debug"
    dapr.io/sidecar-readiness-probe-delay-seconds: "30"
    dapr.io/sidecar-liveness-probe-delay-seconds: "30"

@yaron2
Copy link
Member

yaron2 commented Feb 22, 2022

@a377965 is this Java code the exact same code producing the error? Because according to the error, the subscriber is trying to invoke Dapr or call to some Dapr endpoint.

@artursouza do you spot anything here?

@a377965
Copy link
Author

a377965 commented Feb 22, 2022

Hi Yaron, the log is from this code (the subscription to the topic) which makes it very odd, see the sidecar logs:

time="2022-02-17T21:24:46.361732876Z" level=debug msg="Calling app's handler for message a2894ee1-944f-4ee9-a5bd-d0c52fe9efa4 on topic ap555555.calculation.t" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.contrib type=log ver=1.6.0

followed by the very next (error ) message

time="2022-02-17T21:24:46.362329914Z" level=error msg="non-retriable error returned from app while processing pub/sub event ccd3f051-0295-4cb7-9e8a-7d9bcd91cbef: {"errorCode":"ERR_DIRECT_INVOKE","message":"failed getting app id either from the URL path or the header dapr-app-id"}. status code returned: 404" app_id=ap999999 instance=calculator-microservice-8c65fd8ff-vjjks scope=dapr.runtime type=log ver=1.6.0

I am actually thinking there is something wrong with the service that is PUBLISHING the topic, as the message indicates there is no APPID in URL or HEADER.

@yaron2
Copy link
Member

yaron2 commented Feb 22, 2022

Is the app ap999999 your subscriber or publisher? if it's the subscriber (based on what you said), I think it's very unlikely that it'll come from the publisher also. There's something very basic we're missing here :)

@a377965
Copy link
Author

a377965 commented Feb 22, 2022

@yaron2, this is essentially from your solution : https://docs.dapr.io/developing-applications/building-blocks/pubsub/howto-publish-subscribe/
using ASB over RabbitMQ.
Now, my publisher is ALSO writing to REDIS via a gRPC client and that same client is publishing the message to the topic, which is a difference from the example. The details of the client setup in the publisher from the example solution are not clear.

@yaron2
Copy link
Member

yaron2 commented Feb 22, 2022

@yaron2, this is essentially from your solution : https://docs.dapr.io/developing-applications/building-blocks/pubsub/howto-publish-subscribe/ using ASB over RabbitMQ. Now, my publisher is ALSO writing to REDIS via a gRPC client and that same client is publishing the message to the topic, which is a difference from the example. The details of the client setup in the publisher from the example solution are not clear.

Ok, can you remove the code that writes to Redis and see if you see a difference in the logs?

@a377965
Copy link
Author

a377965 commented Feb 22, 2022

after removing the code which writes to Redis, the problem continues to exist.
The only think I can't seem to get around in my publishing code is that I must set the following property or I cant spin up a new DaprClient :
Properties p = System.getProperties();
p.put("dapr.api.methodInvocation.protocol", "GRPC");
System.setProperties(p);
DaprClient client = new DaprClientBuilder().build();

Without this property being set, I get this message:

stack_trace":"java.lang.NoSuchFieldError: Companion
io.dapr.client.DaprHttp.(DaprHttp.java:125)
io.dapr.client.DaprHttpBuilder.buildDaprHttp(DaprHttpBuilder.java:63)

Could this possibly be leading to the issue?

@a377965
Copy link
Author

a377965 commented Feb 23, 2022

@yaron2 and @artursouza ,
After reading around a bit on other matters, I am believing that I was using a work around of this matter in my PUBLISH service:

and this entire problem stems back to the okhttp3 version ..... any way you guys can confirm or deny this assertion?

This is a significant problem for me in adopting the platform, as we are making significant use of springboot and the recommended workaround it to use a version of the library that is no longer available on Maven.

@artursouza
Copy link
Member

@yaron2 and @artursouza , After reading around a bit on other matters, I am believing that I was using a work around of this matter in my PUBLISH service:

and this entire problem stems back to the okhttp3 version ..... any way you guys can confirm or deny this assertion?

This is a significant problem for me in adopting the platform, as we are making significant use of springboot and the recommended workaround it to use a version of the library that is no longer available on Maven.

@a377965 You should not use gRPC for service invocation. HTTP is the solution to be used. In this case, you can override the okhttp3 version to a higher version to resolve the dependency conflict.

I am moving this issue to java-sdk repo. We can also upgrade to a most recent version of OKHTTP3 for now. The long term goal is to remove it as a dependency.

@artursouza artursouza added the P1 label Feb 23, 2022
@artursouza artursouza transferred this issue from dapr/dapr Feb 23, 2022
@artursouza artursouza added kind/bug Something isn't working size/XS triaged/resolved Items triaged and ready labels Feb 23, 2022
@a377965
Copy link
Author

a377965 commented Feb 23, 2022

@artursouza , I am not sure this is resolved as I am getting a stack overflow error in my application logs using the version 4.9.0 of OKHTTP3 which was using in this post: #687

Is there a recommended version I can use that you know of that is compatible? My managed version is 3.14.9 which does not work but doesn't throw a stack over flow message.

@a377965
Copy link
Author

a377965 commented Feb 24, 2022

@artursouza and @yaron2
I have found a working version of OKHTTP3 (4.8.1) and used it in both my PUBLISH and SUBCRIPTION services, and I am NO longer using the gRPC client in the PUBLISH (the SUBSCRIBER never was using gRPC).

But the original matters continues to occur and I am seeing this message in the SIDECAR logs for the SUBSCRIPTION service:
time="2022-02-24T16:29:22.268053262Z" level=error msg="non-retriable error returned from app while processing pub/sub event d50014ff-7fa4-4c05-ae77-fb98508ad788:
{"errorCode":"ERR_DIRECT_INVOKE","message":"failed getting app id either from the URL path or the header dapr-app-id"}.
status code returned: 404" app_id=ap999999 instance=calculator-microservice-5b7c5b5d8-vfnqj scope=dapr.runtime type=log ver=1.6.0

We have seemed to rule out the OKHTTP3 matter. Any additional thoughts on what may be going wrong would be appreciated!

@yaron2
Copy link
Member

yaron2 commented Feb 24, 2022

@artursouza and @yaron2 I have found a working version of OKHTTP3 (4.8.1) and used it in both my PUBLISH and SUBCRIPTION services, and I am NO longer using the gRPC client in the PUBLISH (the SUBSCRIBER never was using gRPC).

But the original matters continues to occur and I am seeing this message in the SIDECAR logs for the SUBSCRIPTION service: time="2022-02-24T16:29:22.268053262Z" level=error msg="non-retriable error returned from app while processing pub/sub event d50014ff-7fa4-4c05-ae77-fb98508ad788: {"errorCode":"ERR_DIRECT_INVOKE","message":"failed getting app id either from the URL path or the header dapr-app-id"}. status code returned: 404" app_id=ap999999 instance=calculator-microservice-5b7c5b5d8-vfnqj scope=dapr.runtime type=log ver=1.6.0

We have seemed to rule out the OKHTTP3 matter. Any additional thoughts on what may be going wrong would be appreciated!

I'm kind of bewildered by what's going on here :)

Can you confirm your subscriber endpoint does nothing more than return a response to Dapr?

@a377965
Copy link
Author

a377965 commented Feb 24, 2022

yaron2...its still the JAVA CODE above posted on February 21 2022.

Interestingly observation, I was adding the debug k8s annotation to my PUBLISHER to see if anything interesting was being logged in that services sidecar, and I got this:

@timestamp":"2022-02-24T12:35:21.216-05:00","@Version":"1","message":"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception
[Handler dispatch failed; nested exception is java.lang.NoClassDefFoundError: Could not initialize class okhttp3.internal.concurrent.TaskRunner]
with root cause","logger_name":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]",
"thread_name":"http-nio-8080-exec-5","level":"ERROR","level_value":40000,"stack_trace":"java.lang.NoClassDefFoundError: Could not initialize class okhttp3.internal.concurrent.TaskRunner\n\tat

if I remove the dapr debug annotation, all is well, .....very odd. So the only update was the k8s deployment annotation, no code change at all.

@a377965
Copy link
Author

a377965 commented Feb 24, 2022

Solved!!

Operator error, that is misaligned ports on the SUB service! Now the error makes sense as the sidecar could not get the APPID from the SUB service during the subscribe event. I was not clear in the log message where the error was being encountered.

Figure 7.3 in this Microsoft paged helped isolate the problem: https://docs.microsoft.com/en-us/dotnet/architecture/dapr-for-net-developers/publish-subscribe

Thank you both @yaron2 and @artursouza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working P1 size/XS triaged/resolved Items triaged and ready
Projects
None yet
Development

No branches or pull requests

3 participants