thanos+ingress-nginx+grpc: impossible setup due missing host header #1507

danielmotaleite · 2019-09-10T15:06:59Z

Thanos, Prometheus and Golang version used
quay.io/thanos/thanos:v0.7.0

What happened
i setup 2 kubernetes clusters, thanos query is in one cluster (and a local prometheus+sidecar) and need to query the remote kubernetes cluster thanos sidecar, all running in AWS (but not using eks)
I created one ingress-nginx with support for grpc with this config:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: prometheus-k8s-live-a.ops.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-k8s-live-a
          servicePort: 9090
  - host: prometheus-k8s-live-b.ops.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-k8s-live-b
          servicePort: 9090
  tls:
  - hosts:
    - prometheus-k8s-live-a.ops.example.com
    - prometheus-k8s-live-b.ops.example.com
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
  name: grpc-ingress
  namespace: monitoring
spec:
  rules:
  - host: sidecar-k8s-live-a.ops.example.com
    http:
      paths:
      - backend:
          serviceName: sidecar-k8s-live-a
          servicePort: 10911
  - host: sidecar-k8s-live-b.ops.example.com
    http:
      paths:
      - backend:
          serviceName: sidecar-k8s-live-b
          servicePort: 10911
  tls:
  - hosts:
      - sidecar-k8s-live-a.ops.example.com
      - sidecar-k8s-live-b.ops.example.com

thanos query is using

--store=sidecar-k8s-live-a.ops.example.com.:443
--store=sidecar-k8s-live-a.ops.example.com.:443

I can connect to the prometheus url, but the sidecar grpc fail in thanos query.
looking to the nginx logs i can see the query arriving in http2, but returning 400. Doing a curl i can get a 503, but probably just because it is not really grpc. Changing the ingress-nginx logs to show the host header, i can see that curl is sending the correct host header, but for thanos query the logs show only _, it is either sending a empty one or a _.

What you expected to happen
I wanted to share the ingress to receive the https requests for prometheus and the grpc and using the host to redirect the request to the correct service. Sadly thanos query fail to send the host header and so the nginx can't apply the virtual_host search and servers the request from the default site.

Full logs to relevant components

Logs

172.27.119.135 - [172.27.119.135] - - [10/Sep/2019:15:02:40 +0000] "PRI * HTTP/2.0" 400 163 "-" "-" 0 0.001 [] [] - - - - 477873c7a336618ccf06cf9c03fe8d97
172.27.119.135 - [172.27.119.135] - - [10/Sep/2019:15:02:40 +0000] "PRI * HTTP/2.0" 400 163 "-" "-" 0 0.003 [] [] - - - - c32e68975e91159a64326b55d4b72934
2019/09/10 15:02:40 [error] 1137#1137: *7155 upstream rejected request with error 2 while reading response header from upstream, client: 172.26.81.74, server: sidecar-k8s-live-a.ops.example.com, request: "PRI / HTTP/1.1", upstream: "grpc://100.96.136.200:10911", host: "sidecar-k8s-live-a.ops.example.com"
172.26.81.74 - [172.26.81.74] - - [10/Sep/2019:15:02:40 +0000] "PRI / HTTP/1.1" 502 163 "-" "curl/7.58.0" 189 0.002 [monitoring-sidecar-k8s-live-a-10911] [] 100.96.136.200:10911 0 0.004 502 4e08c4e8c6d8df148c5bc3a68d61ccf9

here we can see that the thanos query requests do not trigger the virtual_host, but the curl one, with host, is redirected to thanos sidecar

The text was updated successfully, but these errors were encountered:

danielmotaleite · 2019-09-10T15:17:48Z

just to make it clear:

As there is no host header, nginx will use the default site and the default site is plain http proxy, so it will never hit the grpc proxy config.

If thanos send the host header, nginx will load the correct config and deliver the request to the correct backend with the correct protocol.

danielmotaleite · 2019-09-11T13:14:11Z

Found a reference for this problem in a several months old issue (not directly related to this)
#977 (comment)
and it basically confirms the problem

bwplotka · 2019-09-11T14:59:20Z

Thanks for the report. As answered on the mentioned issue: Have you tried setting up some forward proxy? (e.g nginx or envoy)? I think that might solve your issues as it is more flexible in terms of what certs/credentials you sue.

In GRPC world there is no HOST header really. There is :authority You can read about this here: grpc/grpc#1022

We indeed don't set Authority manually as should be properly exposed by TLS credentials. We might add support for proxying through but again, it might the best if you set up some forward proxy for that.

How you nginx configuration looks like then if you are willing to share (:

mheggeseth · 2019-09-12T19:24:43Z

A reasonable way to work around this with NGINX Ingress Controller is to use the tcp-services-configmap feature to expose ports that route directly to sidecar-k8s-live-a:10911 (e.g. 11911) and sidecar-k8s-live-b:10911 (e.g. 12911) respectively.

Then your thanos-query options would look something like:

--store=sidecar-k8s-live.ops.example.com:11911  # routes to sidecar-k8s-live-a:10911
--store=sidecar-k8s-live.ops.example.com:12911  # routes to sidecar-k8s-live-b:10911

You still have to set up TLS on your own in both thanos-query and thanos-sidecar, but it helps avoid all the HTTP routing that the ingress controller tries to do for you.

stale · 2020-01-11T05:42:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

garenwen · 2020-01-17T09:40:53Z

I had the same problem

garenwen · 2020-01-17T09:42:02Z

Do you have a solution?

cjf-fuller · 2020-01-31T17:05:00Z

Another work around with the NGINX Ingress Controller is to use the --grpc-client-server-name flag on your thanos-query. This uses Server Name Indication, allowing the ingress controller to route the request correctly.

I believe this limits each querier to one server name only. Therefore you will need multiple queriers if you have multiple clusters to communicate between.

Your thanos-query args would include:

--grpc-client-server-name=sidecar-k8s-live.ops.example.com
--grpc-client-tls-secure
--store=dns+sidecar-k8s-live.ops.example.com:443

And your ingress annotations would include:

nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/ssl-redirect: "true"

shane-a-orme · 2020-02-01T12:28:41Z

Cjf-fuller, this could work but it is important to understand that:

The Prometheus stateful set is labeled as thanos-store-api: "true" so that each pod gets discovered by the headless service. This headless service will be used by Thanos Query to query data across all the Prometheus instances. The replica might be up, but querying it will result in a small time gap for the period during which it was down. This isn’t fixed by having a second replica because it could be down at any moment, for example, during a rolling restart. These instances show how load balancing can fail. Be wary as this can lead to overwriting of your initial query and loss of host_header and nginx ability to virtual host_search.

martip07 · 2020-02-06T17:08:28Z

I believe this limits each querier to one server name only. Therefore you will need multiple queriers if you have multiple clusters to communicate between.

Hi, are you sure that it will limit each querier to one server name only?

Regards,

cjf-fuller · 2020-02-11T09:10:41Z

@Than0s-coder, great point, we have set up a “central” Querier to target a “leaf” Querier and not the sidecars directly. But it sounds like this risk of overwriting the initial query and loss of host_headers would still be present?

@martip07, I am still very much a beginner with Thanos so could be totally wrong here. But, as far as I can tell the --grpc-client-server-name argument is a string, that sets ServerName in tls.Config. I am not too sure how I would make this a list of servernames.

I have seen that the TLS Extensions documentation talks of a ServerNameList struct. I cannot find many examples of this being used. I have tested this with a simple comma separated list (--grpc-client-server-name=test-1.myorg.com,test-2.myorg.com) which fails at the SSL handshake because the list is not enumerated at any point. So it fails as the wildcard certificate is valid for “*.myorg.com” and not “test-1.myorg.com,test-2.myorg.com”

stale · 2020-03-12T09:32:44Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

popsikle · 2020-03-23T17:59:25Z

/reopen

ageekymonk · 2020-04-02T05:53:42Z

/reopen.

kakkoyun · 2020-04-06T09:09:49Z

Apparently this is still needed and valid.

To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.

j3p0uk · 2020-04-09T20:29:33Z

Possible fix pushed that uses a flag to change behaviour based around the workaround detailed by @cjf-fuller in #1507 (comment).

If "grpc-client-dns-server-name" flag is specified then use the DNS provider to return back the name that was originally looked up and add the relevant dial options for the grpc at connection time. Allows a different SNI per store, based on the originally provided (dns+<name>:<port>) name.

To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.

To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. Signed-off-by: JP Sullivan <jonpsull@cisco.com>

stale · 2020-05-09T20:37:33Z

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

j3p0uk · 2020-05-11T10:27:36Z

Awaiting design review for fix as per #2407 (comment)

@squat @bwplotka

j3p0uk · 2021-01-29T15:46:59Z

Sure. Do check the logs and see if they match in this case. Check that you can curl between the multiple clusters, etc. That looks like it could be a connectivity issue from your central cluster to query.my-local-domain.local:443 more than the issue detailed here, but that's a guess given there isn't much in the way of debug or logs to go on. Sorry I can't help more :)

IbraheemAlSaady · 2021-01-29T15:55:26Z

@j3p0uk I have tried this with grpcurl from the central cluster grpcurl -insecure query.my-local-domain.local:443 list

I'm getting this response:

grpc.health.v1.Health
grpc.reflection.v1alpha.ServerReflection
thanos.Rules
thanos.Store

I did a describe as well grpcurl -insecure query.my-local-domain.local:443 describe and this is the output

grpc.health.v1.Health is a service:
service Health {
  rpc Check ( .grpc.health.v1.HealthCheckRequest ) returns ( .grpc.health.v1.HealthCheckResponse );
  rpc Watch ( .grpc.health.v1.HealthCheckRequest ) returns ( stream .grpc.health.v1.HealthCheckResponse );
}
grpc.reflection.v1alpha.ServerReflection is a service:
service ServerReflection {
  rpc ServerReflectionInfo ( stream .grpc.reflection.v1alpha.ServerReflectionRequest ) returns ( stream .grpc.reflection.v1alpha.ServerReflectionResponse );
}
Failed to resolve symbol "thanos.Rules": Symbol not found: thanos.Rules

Then grpcurl -insecure query.my-local-domain.local:443 thanos.Store/Info and this is the output

Error invoking method "thanos.Store/Info": target server does not expose service "thanos.Store"

roysha1 · 2021-01-29T16:01:38Z

Might change the ingress beckend protocol to grpcs

IbraheemAlSaady · 2021-01-29T16:32:16Z

@roysha1 sadly, that didn't do it

IbraheemAlSaady · 2021-01-29T16:50:43Z

I have updated my comment regarding the configuration I have to add the logs of the ingress controller

Placidina · 2021-01-30T04:08:36Z

I had a same problem
My solution was to use bitnami charts

Depends: bitnami/charts#5345 bitnami/charts#5344

my bitnami/kube-prometheus custom values:

prometheus:
  disableCompaction: true
  thanos:
    create: true
    objectStorageConfig:
      secretName: thanos-objstore-config
      secretKey: objstore.yml
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
        nginx.ingress.kubernetes.io/auth-tls-secret: monitoring/thanos-certs
        nginx.ingress.kubernetes.io/backend-protocol: GRPC

my bitnami/thanos custom values:

existingObjstoreSecret: thanos-objstore-config
query:
  hostAliases:
  - ip: "111.11.111.1"
    hostnames:
    - thanos.earth.cluster
  - ip: "111.11.112.1"
    hostnames:
    - thanos.mars.cluster
  stores:
  - thanos.earth.cluster:443
  - thanos.mars.cluster:443
  - thanos-storegateway.default.svc.cluster.local:10901
  dnsDiscovery:
    enabled: false
  grpcTLS:
    client:
      secure: true
      cert: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      key: |-
        -----BEGIN PRIVATE KEY-----
        ...
        -----END PRIVATE KEY-----
      ca: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
compactor:
  enabled: true
storegateway:
  enabled: true
  grpc:
    tls:
      enabled: true
      cert: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      key: |-
        -----BEGIN PRIVATE KEY-----
        ...
        -----END PRIVATE KEY-----
      ca: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----

IbraheemAlSaady · 2021-02-03T09:15:16Z

After a while of pulling my hair with this one, I managed to make it work. Just a note here, my ingress is on the Query instance not the sidecar, I would assume it'd work the same way for sidecar (didn't test that part)

My architecture is as follows:

Query (central cluster) -> Query (remote cluster :: ingress on this one) -> Sidecar (remote cluster) 
                        -> Sidecar (central cluster)

I'm deploying the stack with helm, here is my config

Remote & Central Cluster Prometheus Operator

prometheus:
  prometheusSpec:
    thanos:
      image: docker.io/bitnami/thanos
      tag: 0.17.2-scratch-r2
      objectStorageConfig:
        name: thanos
        key: objstore.yml

Remote Cluster Query Config

existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
  dnsDiscovery:
    enabled: false
  stores:
    - kube-prometheus-prometheus-thanos.monitoring:10901 ## <-- thanos-sidecar

  ingress:
    enabled: false # disabled for http
    grpc:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx-internal
        nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
        ingress.kubernetes.io/ssl-redirect: "true"

      hostname: thanos.query.domain.local
      extraTls:
        - hosts:
            - thanos.query.domain.local
          secretName: thanos-grpc-tls

Central Cluster Query Config

existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
  dnsDiscovery:
    enabled: false
  stores:
    ## this setup requires the thanos-sidecar tls to be 
    ## enabled. If you don't want to enable thanos-sidecar tls, you can modify the central cluster config by
    ## 1. create two query instances in the central cluster
    ## 2. first query instance has tls enabled on the client and store urls should only be the remote clusters' 
    ## 3. second query instance will point to the first query by service name, and to the local thanos-sidecar 
    - kube-prometheus-prometheus-thanos.monitoring:10901 
    - thanos.query.domain.local:443
  grpcTLS:
    client:
      secure: true
      existingSecret:
        name: thanos-grpc-tls
        keyMapping:
          ca-cert: ca.crt
          tls-cert: tls.crt
          tls-key: tls.key

Notice the certificate used for query ingress and for client TLS is the same certificate. I hope this helps someone

stale · 2021-04-07T02:27:38Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-07-08T00:28:21Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

ssadok · 2021-07-09T10:28:50Z

Hello for me it's work when you add the extraflag --grpc-client-tls-secure and on the observee cluster i havec certman activated

stale · 2021-09-08T13:09:24Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-10-12T21:26:29Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

countablecloud · 2022-01-13T00:08:50Z

Hello for me it's work when you add the extraflag --grpc-client-tls-secure and on the observee cluster i havec certman activated

For anyone who's bashing their heads against this, this single line fixed it; we have ingress enabled in both observer and remote.

using bitnami kube-prometheus and bitnami thanos on eks 1.21

heres the values
for thanos:

"bucketweb":
  "enabled": true
"compactor":
  "enabled": true
"minio":
  "auth":
    "rootPassword": "password"
    "rootUser": "user"
  "defaultBuckets": "thanos"
  "enabled": true
"objstoreConfig": |
  "config":
    "access_key": "user"
    "bucket": "thanos"
    "endpoint":minio.thanos-grafana.svc.cluster.local:9000
    "insecure": true
    "secret_key": "password"
  "type": "s3"
"query":
  "stores":
  - "thanos.cool-1.foobar.io:443"
  - "thanos.cool-2.foobar.io:443"
  "extraFlags":
  - "--grpc-client-tls-secure"
  # - "--grpc-client-server-name=kube-prometheus-prometheus-thanos"
  "dnsDiscovery":
    "enabled": false
  "ingress":
    "grpc": 
      "enabled": true
      "hostname": "thanos-querier.foobar.io"
      "tls": true"
      "annotations":
        "cert-manager.io/cluster-issuer": "letsencrypt-prod"
        "kubernetes.io/ingress.class": "nginx"
        "nginx.ingress.kubernetes.io/backend-protocol": "GRPC"
        "nginx.ingress.kubernetes.io/ssl-redirect": "true"
        "nginx.ingress.kubernetes.io/grpc-backend": "true"
"ruler":
  "alertmanagers":
  - "http://prometheus-operator-alertmanager.thanos-grafana.svc.cluster.local:9093"
  "config": |
    "groups":
    - "name": "metamonitoring"
      "rules":
      - "alert": "PrometheusDown"
        "expr": absent(up{prometheus="thanos-grafana/prometheus-operator"})
  "enabled": true
"storagegateway":
  "enabled": true

and for kube-prometheus:

"prometheus":
  "externalLabels":
    "cluster": "foobar"
  "thanos":
    "create": true
    "ingress":
      "annotations":
        "cert-manager.io/cluster-issuer": "letsencrypt-prod"
        "kubernetes.io/ingress.class": "nginx"
        "nginx.ingress.kubernetes.io/backend-protocol": "GRPC"
        "nginx.ingress.kubernetes.io/force-ssl-redirect": "true"
        "nginx.ingress.kubernetes.io/grpc-backend": "true"
        "nginx.ingress.kubernetes.io/protocol": "h2c"
        "nginx.ingress.kubernetes.io/proxy-read-timeout": "160"
      "enabled": true
      "hosts":
      - "name": "thanos.cool-1.foobar.io"
      "tls":
        - "hosts":
          - "thanos.cool-1.foobar.io"
          "secretName": "foobar-thanos-tls-secret"

sagiv-zafrani · 2022-02-02T12:32:25Z

It seems that I have the same problem.

The topology is somewhat the same as @IbraheemAlSaady implementation
Query (central cluster) -> Query (central cluster :: grpc server without TLS) (connects to remote environment Ingress using mTLS) -> Query (remote cluster :: ingress on this one) -> Sidecar (remote cluster)
Setup achieved using -

thanos Helm chart by Bitnami
kube-prometheus-stack Helm chart by Prometheus-community

It seems that Thanos - Query on the observer cluster fails to Query remote stores (Ingress which listening on port 80,443 and the backend is thanos-query:GRPC).

My situation is a bit different, I'm using self signed certificate, the issuer in this case is thanos-query-ca (self-signed certificates generated by Helm).
When configuring the remote store (TLS listener of the Ingress deployed on the remote cluster) thanos is failing with updating the new node).

level=warn ts=2022-02-02T12:26:15.917113393Z caller=endpointset.go:500 component=endpointset msg="update of node failed" err="getting metadata: fallback fetching info from <thanos_query_grpc_remote_environment_ingress_hostname>:443: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=<thanos_query_grpc_remote_environment_ingress_hostname>:443

When querying the Ingress using grpcurl, I receive the response below -
Failed to dial target host "<thanos_query_grpc_remote_environment_ingress_hostname>:443": x509: certificate signed by unknown authority

Is there a way to indicate to Thanos - Query to skip verifying the issuer of the certificate?

Thanks in advance

danielmotaleite · 2022-03-18T17:44:43Z

@sagiv-zafrani maybe it is better to open a new issue for your user case, with a link to this one... this one is closed, so that will limit who may see your question

NominalTrajectory · 2022-08-22T11:04:32Z

@sagiv-zafrani, hi, were you able to solve your issue? I'm facing the same problem.

sagiv-zafrani · 2022-08-22T17:20:13Z

@sagiv-zafrani, hi, were you able to solve your issue? I'm facing the same problem.

No, we used generated certificates signed by a CA instead.

tal-ayalon · 2022-11-13T17:35:58Z

@sagiv-zafrani @countablecloud @IbraheemAlSaady
Do you have to use a valid CA? Or a self-generated one is also OK?`

audunsolemdal · 2022-11-16T20:18:44Z

@sagiv-zafrani @countablecloud @IbraheemAlSaady Do you have to use a valid CA? Or a self-generated one is also OK?`

Self generated works OK. If connecting to nginx ingresses, your self signed certificate SANs must match the hostname you use in nginx.

From what I understand, Sagiv describes this solution https://krisztianfekete.org/solving-per-store-tls-limitation-in-thanos-query/

I tried having a single querier in the observer cluster querying sidecars via ingresses in other clusters which work fine. However, when I try to query the storage gateway services located in the observee cluster I struggle to get it working although they should be configured with the same certificate via --grpc-server-tls-cert

junoriosity · 2023-01-15T16:37:24Z

I had a same problem My solution was to use bitnami charts

Depends: bitnami/charts#5345 bitnami/charts#5344

my bitnami/kube-prometheus custom values:

prometheus:
  disableCompaction: true
  thanos:
    create: true
    objectStorageConfig:
      secretName: thanos-objstore-config
      secretKey: objstore.yml
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
        nginx.ingress.kubernetes.io/auth-tls-secret: monitoring/thanos-certs
        nginx.ingress.kubernetes.io/backend-protocol: GRPC

my bitnami/thanos custom values:

existingObjstoreSecret: thanos-objstore-config
query:
  hostAliases:
  - ip: "111.11.111.1"
    hostnames:
    - thanos.earth.cluster
  - ip: "111.11.112.1"
    hostnames:
    - thanos.mars.cluster
  stores:
  - thanos.earth.cluster:443
  - thanos.mars.cluster:443
  - thanos-storegateway.default.svc.cluster.local:10901
  dnsDiscovery:
    enabled: false
  grpcTLS:
    client:
      secure: true
      cert: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      key: |-
        -----BEGIN PRIVATE KEY-----
        ...
        -----END PRIVATE KEY-----
      ca: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
compactor:
  enabled: true
storegateway:
  enabled: true
  grpc:
    tls:
      enabled: true
      cert: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      key: |-
        -----BEGIN PRIVATE KEY-----
        ...
        -----END PRIVATE KEY-----
      ca: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----

Hi @Placidina , many thanks for your suggestion. Could you perhaps outline your solution a little further, particularly regarding the whole certificate process. That would be very helpful. 🙂

junoriosity · 2023-01-15T21:02:48Z

After a while of pulling my hair with this one, I managed to make it work. Just a note here, my ingress is on the Query instance not the sidecar, I would assume it'd work the same way for sidecar (didn't test that part)

My architecture is as follows:

Query (central cluster) -> Query (remote cluster :: ingress on this one) -> Sidecar (remote cluster) 
                        -> Sidecar (central cluster)

I'm deploying the stack with helm, here is my config

Remote & Central Cluster Prometheus Operator

prometheus:
  prometheusSpec:
    thanos:
      image: docker.io/bitnami/thanos
      tag: 0.17.2-scratch-r2
      objectStorageConfig:
        name: thanos
        key: objstore.yml

Remote Cluster Query Config

existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
  dnsDiscovery:
    enabled: false
  stores:
    - kube-prometheus-prometheus-thanos.monitoring:10901 ## <-- thanos-sidecar

  ingress:
    enabled: false # disabled for http
    grpc:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx-internal
        nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
        ingress.kubernetes.io/ssl-redirect: "true"

      hostname: thanos.query.domain.local
      extraTls:
        - hosts:
            - thanos.query.domain.local
          secretName: thanos-grpc-tls

Central Cluster Query Config

existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
  dnsDiscovery:
    enabled: false
  stores:
    ## this setup requires the thanos-sidecar tls to be 
    ## enabled. If you don't want to enable thanos-sidecar tls, you can modify the central cluster config by
    ## 1. create two query instances in the central cluster
    ## 2. first query instance has tls enabled on the client and store urls should only be the remote clusters' 
    ## 3. second query instance will point to the first query by service name, and to the local thanos-sidecar 
    - kube-prometheus-prometheus-thanos.monitoring:10901 
    - thanos.query.domain.local:443
  grpcTLS:
    client:
      secure: true
      existingSecret:
        name: thanos-grpc-tls
        keyMapping:
          ca-cert: ca.crt
          tls-cert: tls.crt
          tls-key: tls.key

Notice the certificate used for query ingress and for client TLS is the same certificate. I hope this helps someone

@IbraheemAlSaady I like your solution a lot. However, since I am using Cloudflare I only get tls.crt and tls.key from them. Could you help me getting the whole certificate stuff done? That would be awesome. 🙂

Nashluffy · 2024-04-10T09:17:04Z

Hello for me it's work when you add the extraflag --grpc-client-tls-secure and on the observee cluster i havec certman activated

This fixed it for us as well. We have ingress-nginx terminating TLS, so when querier hits it, it should expect to perform a TLS handshake.

We also needed the following annotations on our ingress resource

    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/ssl-redirect: "true"

We first made sure we could hit the endpoint from our local machines using grpcurl like

$ grpcurl <host>:<port> list
grpc.health.v1.Health
grpc.reflection.v1.ServerReflection
grpc.reflection.v1alpha.ServerReflection
thanos.Exemplars
thanos.Metadata
thanos.Rules
thanos.Store
thanos.Targets
thanos.info.Info

but thanos query was still failing with errors like

ts=2024-04-10T08:35:15.380752187Z caller=endpointset.go:394 level=warn component=endpointset msg="new endpoint creation failed" err="dialing connection: context deadline exceeded: connection error: desc = \"error reading server preface: EOF\"" address=<host>:<port>

Once we enabled --grpc-client-tls-secure query was successfully able to connect through ingress-nginx

ArieLevs · 2024-06-25T08:55:25Z

@Nashluffy possible for you to share Thanos/nginx configs, i'm hitting a similar issue (different error)
as when executing grpcurl -insecure my.thanos.prometheus.sidecar.dns:443 list the result is

grpc.health.v1.Health
grpc.reflection.v1.ServerReflection
grpc.reflection.v1alpha.ServerReflection
thanos.Exemplars
thanos.Metadata
thanos.Rules
thanos.Store
thanos.Targets
thanos.info.Info

but Thanos query pod hits an error with

level=warn component=endpointset msg="new endpoint creation failed" err="dialing connection: context deadline exceeded: connection error: desc = \"transport: authentication handshake failed: tls: first record does not look like a TLS handshake\""

my query chart values (from bitnami) with these values:

stores:
  - my.thanos.prometheus.sidecar.dns:443
extraFlags:
  - "--grpc-client-tls-secure"
  - "--grpc-client-tls-skip-verify"

i didn't created any custom certificates on Thanos query side, except lets encrypt on the Thanos sidecar ingress
since the grpcurl command works as expected, but calls from Thanos don't, i suspect the issue is somewhere with Thanos and not the ingress controller side?

the -insecure and --grpc-client-tls-skip-verify values here since i currently test again a non prod env and use (STAGING) Let's Encrypt certificates

ArieLevs · 2024-07-09T17:00:26Z

update regarding above error, turns out all was OK, just that storegateway expected the client to authenticate with him, once i set clientAuthEnabled: false on storegateway above TLS handshake error went way.

in addition if you are using network load balancer to access Thanos sidecars, implementing TLS on the LB level (rather then ingress) make sure to update the ALPN policy at least to HTTP2Optional, for a k8s service the annotation is

service.beta.kubernetes.io/aws-load-balancer-alpn-policy: HTTP2Optional

stale bot added the stale label Jan 11, 2020

stale bot removed the stale label Jan 17, 2020

stale bot added the stale label Mar 12, 2020

stale bot closed this as completed Mar 19, 2020

kakkoyun removed the stale label Apr 6, 2020

kakkoyun reopened this Apr 6, 2020

j3p0uk added a commit to j3p0uk/thanos that referenced this issue Apr 9, 2020

Add servername to grpc dial options for DNS stores (thanos-io#1507)

62dd1b6

To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.

j3p0uk added a commit to j3p0uk/thanos that referenced this issue Apr 9, 2020

Add servername to grpc dial options for DNS stores (thanos-io#1507)

fe4f643

To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.

j3p0uk mentioned this issue Apr 10, 2020

Query: Add servername to grpc dial options for DNS stores (Fixes #1507) #2407

Closed

2 tasks

stale bot added the stale label May 9, 2020

stale bot added the stale label Apr 7, 2021

GiedriusS removed the stale label May 4, 2021

stale bot added the stale label Jul 8, 2021

stale bot removed the stale label Jul 9, 2021

stale bot added the stale label Sep 8, 2021

stale bot closed this as completed Oct 12, 2021

thanos+ingress-nginx+grpc: impossible setup due missing host header #1507

thanos+ingress-nginx+grpc: impossible setup due missing host header #1507

Comments

danielmotaleite commented Sep 10, 2019 • edited Loading

danielmotaleite commented Sep 10, 2019

danielmotaleite commented Sep 11, 2019

bwplotka commented Sep 11, 2019

mheggeseth commented Sep 12, 2019

stale bot commented Jan 11, 2020

garenwen commented Jan 17, 2020

garenwen commented Jan 17, 2020

cjf-fuller commented Jan 31, 2020

shane-a-orme commented Feb 1, 2020

martip07 commented Feb 6, 2020 • edited Loading

cjf-fuller commented Feb 11, 2020

stale bot commented Mar 12, 2020

popsikle commented Mar 23, 2020

ageekymonk commented Apr 2, 2020

kakkoyun commented Apr 6, 2020

j3p0uk commented Apr 9, 2020

stale bot commented May 9, 2020

j3p0uk commented May 11, 2020

j3p0uk commented Jan 29, 2021

IbraheemAlSaady commented Jan 29, 2021 • edited Loading

roysha1 commented Jan 29, 2021

IbraheemAlSaady commented Jan 29, 2021 • edited Loading

IbraheemAlSaady commented Jan 29, 2021

Placidina commented Jan 30, 2021 • edited Loading

IbraheemAlSaady commented Feb 3, 2021 • edited Loading

stale bot commented Apr 7, 2021

stale bot commented Jul 8, 2021

ssadok commented Jul 9, 2021

stale bot commented Sep 8, 2021

stale bot commented Oct 12, 2021

countablecloud commented Jan 13, 2022 • edited Loading

sagiv-zafrani commented Feb 2, 2022 • edited Loading

danielmotaleite commented Mar 18, 2022

NominalTrajectory commented Aug 22, 2022

sagiv-zafrani commented Aug 22, 2022 • edited Loading

tal-ayalon commented Nov 13, 2022

audunsolemdal commented Nov 16, 2022

junoriosity commented Jan 15, 2023

junoriosity commented Jan 15, 2023 • edited Loading

Nashluffy commented Apr 10, 2024

ArieLevs commented Jun 25, 2024

ArieLevs commented Jul 9, 2024

danielmotaleite commented Sep 10, 2019 •

edited

Loading

martip07 commented Feb 6, 2020 •

edited

Loading

IbraheemAlSaady commented Jan 29, 2021 •

edited

Loading

IbraheemAlSaady commented Jan 29, 2021 •

edited

Loading

Placidina commented Jan 30, 2021 •

edited

Loading

IbraheemAlSaady commented Feb 3, 2021 •

edited

Loading

countablecloud commented Jan 13, 2022 •

edited

Loading

sagiv-zafrani commented Feb 2, 2022 •

edited

Loading

sagiv-zafrani commented Aug 22, 2022 •

edited

Loading

junoriosity commented Jan 15, 2023 •

edited

Loading