-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thanos+ingress-nginx+grpc: impossible setup due missing host header #1507
Comments
just to make it clear: As there is no host header, nginx will use the default site and the default site is plain http proxy, so it will never hit the grpc proxy config. If thanos send the host header, nginx will load the correct config and deliver the request to the correct backend with the correct protocol. |
Found a reference for this problem in a several months old issue (not directly related to this) |
Thanks for the report. As answered on the mentioned issue: Have you tried setting up some forward proxy? (e.g nginx or envoy)? I think that might solve your issues as it is more flexible in terms of what certs/credentials you sue. In GRPC world there is no HOST header really. There is We indeed don't set How you nginx configuration looks like then if you are willing to share (: |
A reasonable way to work around this with NGINX Ingress Controller is to use the Then your --store=sidecar-k8s-live.ops.example.com:11911 # routes to sidecar-k8s-live-a:10911
--store=sidecar-k8s-live.ops.example.com:12911 # routes to sidecar-k8s-live-b:10911 You still have to set up TLS on your own in both |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I had the same problem |
Do you have a solution? |
Another work around with the NGINX Ingress Controller is to use the I believe this limits each querier to one server name only. Therefore you will need multiple queriers if you have multiple clusters to communicate between. Your
And your ingress annotations would include:
|
Cjf-fuller, this could work but it is important to understand that: The Prometheus stateful set is labeled as thanos-store-api: "true" so that each pod gets discovered by the headless service. This headless service will be used by Thanos Query to query data across all the Prometheus instances. The replica might be up, but querying it will result in a small time gap for the period during which it was down. This isn’t fixed by having a second replica because it could be down at any moment, for example, during a rolling restart. These instances show how load balancing can fail. Be wary as this can lead to overwriting of your initial query and loss of host_header and nginx ability to virtual host_search. |
Hi, are you sure that it will limit each querier to one server name only? Regards, |
@Than0s-coder, great point, we have set up a “central” Querier to target a “leaf” Querier and not the sidecars directly. But it sounds like this risk of overwriting the initial query and loss of host_headers would still be present? @martip07, I am still very much a beginner with Thanos so could be totally wrong here. But, as far as I can tell the I have seen that the TLS Extensions documentation talks of a |
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. |
/reopen |
/reopen. |
Apparently this is still needed and valid. |
To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.
To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.
Possible fix pushed that uses a flag to change behaviour based around the workaround detailed by @cjf-fuller in #1507 (comment). If "grpc-client-dns-server-name" flag is specified then use the DNS provider to return back the name that was originally looked up and add the relevant dial options for the grpc at connection time. Allows a different SNI per store, based on the originally provided ( |
To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection.
To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. Signed-off-by: JP Sullivan <jonpsull@cisco.com>
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Awaiting design review for fix as per #2407 (comment) |
Sure. Do check the logs and see if they match in this case. Check that you can curl between the multiple clusters, etc. That looks like it could be a connectivity issue from your central cluster to query.my-local-domain.local:443 more than the issue detailed here, but that's a guess given there isn't much in the way of debug or logs to go on. Sorry I can't help more :) |
@j3p0uk I have tried this with I'm getting this response:
I did a describe as well
Then
|
Might change the ingress beckend protocol to grpcs |
@roysha1 sadly, that didn't do it |
I have updated my comment regarding the configuration I have to add the logs of the ingress controller |
I had a same problem Depends: bitnami/charts#5345 bitnami/charts#5344 my prometheus:
disableCompaction: true
thanos:
create: true
objectStorageConfig:
secretName: thanos-objstore-config
secretKey: objstore.yml
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
nginx.ingress.kubernetes.io/auth-tls-secret: monitoring/thanos-certs
nginx.ingress.kubernetes.io/backend-protocol: GRPC my existingObjstoreSecret: thanos-objstore-config
query:
hostAliases:
- ip: "111.11.111.1"
hostnames:
- thanos.earth.cluster
- ip: "111.11.112.1"
hostnames:
- thanos.mars.cluster
stores:
- thanos.earth.cluster:443
- thanos.mars.cluster:443
- thanos-storegateway.default.svc.cluster.local:10901
dnsDiscovery:
enabled: false
grpcTLS:
client:
secure: true
cert: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
key: |-
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
ca: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
compactor:
enabled: true
storegateway:
enabled: true
grpc:
tls:
enabled: true
cert: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
key: |-
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
ca: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE----- |
After a while of pulling my hair with this one, I managed to make it work. Just a note here, my ingress is on the Query instance not the sidecar, I would assume it'd work the same way for sidecar (didn't test that part) My architecture is as follows:
I'm deploying the stack with helm, here is my config Remote & Central Cluster Prometheus Operator prometheus:
prometheusSpec:
thanos:
image: docker.io/bitnami/thanos
tag: 0.17.2-scratch-r2
objectStorageConfig:
name: thanos
key: objstore.yml Remote Cluster Query Config existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
dnsDiscovery:
enabled: false
stores:
- kube-prometheus-prometheus-thanos.monitoring:10901 ## <-- thanos-sidecar
ingress:
enabled: false # disabled for http
grpc:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-internal
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
ingress.kubernetes.io/ssl-redirect: "true"
hostname: thanos.query.domain.local
extraTls:
- hosts:
- thanos.query.domain.local
secretName: thanos-grpc-tls Central Cluster Query Config existingObjstoreSecret: objstorage
clusterDomain: cluster.local
query:
dnsDiscovery:
enabled: false
stores:
## this setup requires the thanos-sidecar tls to be
## enabled. If you don't want to enable thanos-sidecar tls, you can modify the central cluster config by
## 1. create two query instances in the central cluster
## 2. first query instance has tls enabled on the client and store urls should only be the remote clusters'
## 3. second query instance will point to the first query by service name, and to the local thanos-sidecar
- kube-prometheus-prometheus-thanos.monitoring:10901
- thanos.query.domain.local:443
grpcTLS:
client:
secure: true
existingSecret:
name: thanos-grpc-tls
keyMapping:
ca-cert: ca.crt
tls-cert: tls.crt
tls-key: tls.key Notice the certificate used for query ingress and for client TLS is the same certificate. I hope this helps someone |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Hello for me it's work when you add the extraflag |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
For anyone who's bashing their heads against this, this single line fixed it; we have ingress enabled in both observer and remote. using bitnami kube-prometheus and bitnami thanos on eks 1.21 heres the values
and for kube-prometheus:
|
It seems that I have the same problem. The topology is somewhat the same as @IbraheemAlSaady implementation
It seems that Thanos - Query on the observer cluster fails to Query remote stores (Ingress which listening on port 80,443 and the backend is thanos-query:GRPC). My situation is a bit different, I'm using self signed certificate, the issuer in this case is thanos-query-ca (self-signed certificates generated by Helm).
When querying the Ingress using grpcurl, I receive the response below - Is there a way to indicate to Thanos - Query to skip verifying the issuer of the certificate? Thanks in advance |
@sagiv-zafrani maybe it is better to open a new issue for your user case, with a link to this one... this one is closed, so that will limit who may see your question |
@sagiv-zafrani, hi, were you able to solve your issue? I'm facing the same problem. |
No, we used generated certificates signed by a CA instead. |
@sagiv-zafrani @countablecloud @IbraheemAlSaady |
Self generated works OK. If connecting to nginx ingresses, your self signed certificate SANs must match the hostname you use in nginx. From what I understand, Sagiv describes this solution https://krisztianfekete.org/solving-per-store-tls-limitation-in-thanos-query/ I tried having a single querier in the observer cluster querying sidecars via ingresses in other clusters which work fine. However, when I try to query the storage gateway services located in the observee cluster I struggle to get it working although they should be configured with the same certificate via |
Hi @Placidina , many thanks for your suggestion. Could you perhaps outline your solution a little further, particularly regarding the whole certificate process. That would be very helpful. 🙂 |
@IbraheemAlSaady I like your solution a lot. However, since I am using Cloudflare I only get tls.crt and tls.key from them. Could you help me getting the whole certificate stuff done? That would be awesome. 🙂 |
This fixed it for us as well. We have ingress-nginx terminating TLS, so when querier hits it, it should expect to perform a TLS handshake. We also needed the following annotations on our ingress resource
We first made sure we could hit the endpoint from our local machines using grpcurl like
but thanos query was still failing with errors like
Once we enabled |
@Nashluffy possible for you to share Thanos/nginx configs, i'm hitting a similar issue (different error)
but Thanos query pod hits an error with
my query chart values (from bitnami) with these values: stores:
- my.thanos.prometheus.sidecar.dns:443
extraFlags:
- "--grpc-client-tls-secure"
- "--grpc-client-tls-skip-verify" i didn't created any custom certificates on Thanos query side, except lets encrypt on the Thanos sidecar ingress
|
update regarding above error, turns out all was OK, just that storegateway expected the client to authenticate with him, once i set in addition if you are using network load balancer to access Thanos sidecars, implementing TLS on the LB level (rather then ingress) make sure to update the ALPN policy at least to
|
Thanos, Prometheus and Golang version used
quay.io/thanos/thanos:v0.7.0
What happened
i setup 2 kubernetes clusters, thanos query is in one cluster (and a local prometheus+sidecar) and need to query the remote kubernetes cluster thanos sidecar, all running in AWS (but not using eks)
I created one ingress-nginx with support for grpc with this config:
thanos query is using
I can connect to the prometheus url, but the sidecar grpc fail in thanos query.
looking to the nginx logs i can see the query arriving in http2, but returning 400. Doing a curl i can get a 503, but probably just because it is not really grpc. Changing the ingress-nginx logs to show the host header, i can see that curl is sending the correct host header, but for thanos query the logs show only
_
, it is either sending a empty one or a_
.What you expected to happen
I wanted to share the ingress to receive the https requests for prometheus and the grpc and using the host to redirect the request to the correct service. Sadly thanos query fail to send the host header and so the nginx can't apply the virtual_host search and servers the request from the default site.
Full logs to relevant components
here we can see that the thanos query requests do not trigger the virtual_host, but the curl one, with host, is redirected to thanos sidecar
The text was updated successfully, but these errors were encountered: