Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opaqueness not applied to off-cluster destination with enable-external-profiles annotation #10354

Open
dkulchinsky opened this issue Feb 18, 2023 · 10 comments

Comments

@dkulchinsky
Copy link

dkulchinsky commented Feb 18, 2023

What is the issue?

We're running Linekrd stable-2.12.2

Linkerd is configured with:

proxy.opaquePorts: 25,587,3306,4444,5432,6379,26379,9300,11211

We set config.linkerd.io/enable-external-profiles: "true" annotation on application Pods that connect to a MySQL server off-cluster on port 3306 (following the instructions from https://linkerd.io/2.12/features/protocol-detection/#setting-the-enable-external-profiles-annotation)

However, the application is failing to connect to the MySQL server and we see the following errors in linkerd proxy logs:

[    12.990661s]  INFO ThreadId(01) outbound:proxy{addr=10.14.0.218:3306}: linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s

the address 10.14.0.218 is outside the cluster networks ranges (defined as: clusterNetworks: 172.20.0.0/17,172.20.128.0/17)

Here's the manifest metadata of the running Pod:

kind: Pod
metadata:
  annotations:
    checksum/configmap-key-config.properties: ec936facad2bfc7bf8863ae2b8d3f90356bdfc94e2940ed31654f43abb2b0efb
    cni.projectcalico.org/containerID: 590f016aabbac75a6825ad52e018ea71e4e3d09d341d8b232d6a17cf200e7eca
    cni.projectcalico.org/podIP: 172.20.11.247/32
    cni.projectcalico.org/podIPs: 172.20.11.247/32
    config.linkerd.io/enable-external-profiles: "true"
    linkerd.io/created-by: linkerd/proxy-injector stable-2.12.2
    linkerd.io/inject: enabled
    linkerd.io/proxy-version: stable-2.12.2
    linkerd.io/trust-root-sha256: 1d57b9c015280710eafad0935ee3ec0bc4d7eb430908e89ae20c5ab7e5ec9f80
    vault.security.banzaicloud.io/vault-addr: https://vault.vault.svc:8200
    vault.security.banzaicloud.io/vault-env-daemon: "false"
    vault.security.banzaicloud.io/vault-role: k8s-eventbus-maxwell
    viz.linkerd.io/tap-enabled: "true"

I was reviewing a related issue #8273, which seem to suggest that this was fixed by linkerd/linkerd2-proxy#1617 and from what I can tell should be included in stable-2.12.2, unfortunately we are not able to get this to work as expected.

For now we're using config.linkerd.io/skip-outbound-ports: "3306" as a workaround, but we are hoping to not need this and use the external profiles method instead.

How can it be reproduced?

  1. Deploy Linkerd stable-2.12.2
  2. Run an application Pod with config.linkerd.io/enable-external-profiles: "true" annotation connecting to a MySQL server on port 3306 running off-cluster (not in the clusterNetworks range(s))
  3. Observe as applications fails to connect and linkerd-proxy reports protocol detection timed out after 10s

Logs, error output, etc

[    12.990661s]  INFO ThreadId(01) outbound:proxy{addr=10.14.0.218:3306}: linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s

output of linkerd check -o short

Linkerd core checks
===================

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.12.2 but the latest stable version is 2.12.4
    see https://linkerd.io/2.12/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.12.2 but the latest stable version is 2.12.4
    see https://linkerd.io/2.12/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-5cc958f64c-jjbhq (stable-2.12.2)
	* linkerd-destination-5cc958f64c-lj8ss (stable-2.12.2)
	* linkerd-destination-5cc958f64c-rjmlq (stable-2.12.2)
	* linkerd-identity-84f9d7cf87-6jtxc (stable-2.12.2)
	* linkerd-identity-84f9d7cf87-g5ndc (stable-2.12.2)
	* linkerd-identity-84f9d7cf87-phbjm (stable-2.12.2)
	* linkerd-proxy-injector-5cd47b84fd-dxpkg (stable-2.12.2)
	* linkerd-proxy-injector-5cd47b84fd-phwcv (stable-2.12.2)
	* linkerd-proxy-injector-5cd47b84fd-zkbq2 (stable-2.12.2)
    see https://linkerd.io/2.12/checks/#l5d-cp-proxy-version for hints

Linkerd extensions checks
=========================

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* metrics-api-855d59f76c-68nz9 (stable-2.12.2)
	* prometheus-f7c9f5f74-88djq (stable-2.12.2)
	* tap-74db455fc9-p4gvh (stable-2.12.2)
	* tap-74db455fc9-qvqxt (stable-2.12.2)
	* tap-74db455fc9-v92b4 (stable-2.12.2)
	* tap-injector-5875b778dc-hfmcx (stable-2.12.2)
	* web-576647df96-mnvh6 (stable-2.12.2)
    see https://linkerd.io/2.12/checks/#l5d-viz-proxy-cp-version for hints

Status check results are √

Environment

  • Kubernetes version: 1.23.10
  • Environment: kops 1.25.3
  • Host OS: Ubuntu 20.04.5 LTS (Kernel 5.15.0-1021-aws)
  • Linkerd version: stable-2.12.2

Possible solution

as a workaround, we are currently using the config.linkerd.io/skip-outbound-ports annotation to skip port 3306 on Pods that need to connect to MySQL database off-cluster

Additional context

Opaqueness for port 3306 works just fine for MySQL database running in-cluster, so this is only affecting connections to MySQL servers running off-cluster.

Would you like to work on fixing this bug?

None

@jeremychase jeremychase added this to the stable-2.13.0 milestone Feb 23, 2023
@jeremychase jeremychase added the priority/P2 Nice-to-have for Release label Feb 23, 2023
@dkulchinsky
Copy link
Author

Hey folks 👋🏼

I saw this was labelled for 2.13, but just wanted to know if you think this is an issue in stable-2.12? or possibly something we have misconfigured?

@jeremychase
Copy link
Contributor

@dkulchinsky We suspect this is a problem with stable-2.12 but need to spend more time debugging before we know for certain.

@dkulchinsky
Copy link
Author

Thanks @jeremychase 👍🏼 let me know if you need additional information from me.

@dkulchinsky
Copy link
Author

Hey @jeremychase, @risingspiral 👋🏼

Just saw Linkerd 2.13.0 was released, congrats! 🥳

Wanted to check in to see if this issue is something already covered/fixed in 2.13? or would that be in a future path release?

@olix0r
Copy link
Member

olix0r commented Apr 12, 2023

@dkulchinsky It will be in the future path. In 2.13 we've begun to change the discovery system away from ServiceProfiles. I think we're unlikely to invest more in "external service profiles", but we're still keenly interested in solving the underlying problem of being able to disable protocol detection for out-of-cluster traffic.

@dkulchinsky
Copy link
Author

@dkulchinsky It will be in the future path. In 2.13 we've begun to change the discovery system away from ServiceProfiles. I think we're unlikely to invest more in "external service profiles", but we're still keenly interested in solving the underlying problem of being able to disable protocol detection for out-of-cluster traffic.

Thanks @olix0r, I think decoupling these concerns makes total sense.

Will be watching this space for updates as this is one of those issues that we constantly trip over with our users 😓 I'm guessing there's no ETA you can share at this point?

@stale
Copy link

stale bot commented Jul 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 11, 2023
@dkulchinsky
Copy link
Author

still an issue AFAIK, hoping there's some news about this? @olix0r

@stale stale bot removed the wontfix label Jul 12, 2023
@wmorgan wmorgan added the pinned label Aug 1, 2023
@chris-ng-scmp
Copy link

Have the same issue in the latest 2.14.0

still can see the protocol detection for one of the opaquePorts

I have also tried to set with skipSubnets (--subnets-to-ignore), but protocol detection still running for the request...

 linkerd-proxy {"timestamp":"[   632.121291s]","level":"INFO","fields":{"message":"Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s"},"target":"linkerd_detect","spans":[{"name":"outbound"},{"addr":"xxxxx:3306","name":"proxy"}],"threadId":"ThreadId(1)"}

Only config.linkerd.io/skip-outbound-ports will work

@kflynn
Copy link
Member

kflynn commented Oct 26, 2023

For the record, we hear y'all on this one: being able to do egress traffic without protocol detection delays would be a good thing.

We want to separate the solution of that problem from the mechanism of ServiceProfiles, though, especially as we've been moving more toward Gateway API. Any thoughts on what kind of mechanisms would fit your use cases particularly well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants