Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Not working with dapr #1420

Closed
1 of 16 tasks
exosapp opened this issue Aug 6, 2020 · 9 comments
Closed
1 of 16 tasks

Not working with dapr #1420

exosapp opened this issue Aug 6, 2020 · 9 comments
Assignees
Labels
size/L 14 days (~2.5 weeks)
Milestone

Comments

@exosapp
Copy link

exosapp commented Aug 6, 2020

Bug description:

Affected area (please mark with X where applicable):

  • Install
  • SMI Traffic Access Policy
  • SMI Traffic Specs Policy
  • SMI Traffic Split Policy
  • Permissive Traffic Policy
  • Ingress
  • Egress
  • Envoy Control Plane
  • CLI Tool
  • Metrics
  • Certificate Management
  • Sidecar Injection
  • Logging
  • Debugging
  • Tests
  • CI System

Expected behavior:

I can successfully run dapr applications in my cluster

Steps to reproduce the bug (as precisely as possible):

Install dapr with helm, install OSM.
Deploy anything with the dapr.io/enabled: "true" annotation and the pods will start crashing.

How was OSM installed?:

osm install.

Anything else we need to know?:

It looks like that when envoy is injected, the dapr sidecar fails health checks.
If I remove OSM, everything works fine.

Liveness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:48964->10.244.0.57:3500: read: connection reset by peer
  Warning  Unhealthy  20s               Readiness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:48992->10.244.0.57:3500: read: connection reset by peer
  Warning  Unhealthy  17s               Liveness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:49012->10.244.0.57:3500: read: connection reset by peer
  Warning  Unhealthy  14s                Readiness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:49034->10.244.0.57:3500: read: connection reset by peer
  Normal   Killing    11s                Container daprd failed liveness probe, will be restarted
  Warning  Unhealthy  11s              Liveness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:49050->10.244.0.57:3500: read: connection reset by peer
  Warning  Unhealthy  8s               Readiness probe failed: Get http://10.244.0.57:3500/v1.0/healthz: read tcp 10.244.0.1:49070->10.244.0.57:3500: read: connection reset by peer

Environment:

  • OSM version (use osm version): v0.1.0
  • Kubernetes version (use kubectl version): 1.16.10
  • Size of cluster (number of worker nodes in the cluster): 3
  • Others:
@exosapp exosapp added the kind/bug Something isn't working label Aug 6, 2020
@draychev draychev self-assigned this Aug 6, 2020
@shashankram shashankram added feature request and removed kind/bug Something isn't working labels Aug 6, 2020
@yaron2
Copy link

yaron2 commented Aug 7, 2020

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

@draychev let me know if there's something I can do to help with this.

@draychev
Copy link
Contributor

draychev commented Aug 8, 2020

@yaron2 I'll take you up on your offer to help. @shashankram and I chatted and decided we'll reclassify as a feature request because we had planned to spend some focused time on testing OSM w/ Dapr and fine-tuning, but we have not had a chance to get to this. 🔜

@shashankram
Copy link
Member

shashankram commented Aug 8, 2020

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).

@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

@shashankram
Copy link
Member

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).

@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

Ignore this comment, the result will be the same. We need a bit more work to allow enabling this.

@yaron2
Copy link

yaron2 commented Aug 9, 2020

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).
@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

Ignore this comment, the result will be the same. We need a bit more work to allow enabling this.

It seems like you found the root cause, can you explain what causes this?

@shashankram
Copy link
Member

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).
@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

Ignore this comment, the result will be the same. We need a bit more work to allow enabling this.

It seems like you found the root cause, can you explain what causes this?

@yaron2 the problem seems to be that we only allow traffic destined to the application service pod and not other services. We have more work to do before inbound traffic to non application containers works as expected.

@yaron2
Copy link

yaron2 commented Aug 10, 2020

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).
@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

Ignore this comment, the result will be the same. We need a bit more work to allow enabling this.

It seems like you found the root cause, can you explain what causes this?

@yaron2 the problem seems to be that we only allow traffic destined to the application service pod and not other services. We have more work to do before inbound traffic to non application containers works as expected.

How do you know which container in a pod is the application container?

@shashankram
Copy link
Member

Not sure this is a feature request, I expect a transparent proxy to honor the health checks of containers in the pod.

That's a fair point. Is Dapr a sidecar to the application pod? If the liveness probes are failing, it seems as though these are being blocked by the proxy. The proxy blocks all traffic by default, this can be changed by enabling permissive mode to allow service-to-service traffic between namespaces participating in the service mesh (during install: osm install --enable-permissive-traffic-policy).
@exosapp @yaron2 would you mind trying the Dapr inegration with permissive mode enabled during install?

Ignore this comment, the result will be the same. We need a bit more work to allow enabling this.

It seems like you found the root cause, can you explain what causes this?

@yaron2 the problem seems to be that we only allow traffic destined to the application service pod and not other services. We have more work to do before inbound traffic to non application containers works as expected.

How do you know which container in a pod is the application container?

Refer to https://github.com/openservicemesh/osm/blob/main/pkg/catalog/xds_certificates.go#L19

There are some limitations that need to be addressed:

  1. Allowing inbound access to non application containers
  2. Allowing non service containers to be accessible (see Client pod without service cannot talk to other services  #1180)

@snehachhabria
Copy link
Contributor

I was able to run Dapr successfully on my cluster and the health probes did not fail. I installed Dapr via Dapr cli and set the enable mtls to false, osm was installed via cli (with --enable-permissive-traffic-policy=true).

We updated the health probes in osm : https://github.com/openservicemesh/osm/blob/main/docs/content/docs/application_health_probes.md which could have potentially fixed the issue.

FYI, I am working on this issue #2875 as a follow up and will write a doc to demo how osm works with the Dapr hello kubernetes example (https://github.com/dapr/quickstarts/tree/master/hello-kubernetes)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/L 14 days (~2.5 weeks)
Projects
None yet
Development

No branches or pull requests

7 participants