Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRV records that point to names that don't resolve to the service's IP #832

Closed
treed opened this issue Mar 31, 2015 · 12 comments
Closed

SRV records that point to names that don't resolve to the service's IP #832

treed opened this issue Mar 31, 2015 · 12 comments
Labels
thinking More time is needed to research by the Consul Contributors type/bug Feature does not function as expected
Milestone

Comments

@treed
Copy link

treed commented Mar 31, 2015

I'm having an issue where:

  • Software is using a SRV record via consul to find a service, which returns the appropriate port for the service, and the FQDN for the relevant agent's node. The additional info section includes a record for that name, mapping to the service's IP.
  • The software is disregarding the additional info and doing a lookup for the FQDN itself, which resolves to a totally different IP, which is not the one the service is listening on.

It seems reasonable to me that SRV records could respond with an FQDN that leads to the service itself, although I'm not sure how that would look.

I'm also unsure if the behavior of this software (srv-router in this case) is typical, but it would not surprise me if this came up in other instances.

@treed treed changed the title SRV records that point to names that always resolve to the service's IP SRV records that point to names that don't resolve to the service's IP Mar 31, 2015
@armon
Copy link
Member

armon commented Apr 1, 2015

I guess I'm a bit confused here. We are already returning the A record of the node along with the SRV record. Is the issue that the service address is different than the node address?

@treed
Copy link
Author

treed commented Apr 1, 2015

The issue is related to the fact that the service address is different from the node address.

The SRV response includes the service address under the node's name, but in this instance I have software (srv-router) that ignores that and just does a separate lookup for the node address returned by the SRV request, which is not correct for the service.

It's possible that this is a bug in how srv-router is handling the lookup, but it would not surprise me if this were a common thing in other software too, so it's probably better to handle this more systematically.

One thought occurred to me for something like having the SRV response change to <service-id>.<node>.node.consul and to have that map directly to the service IP, but I don't know if maybe there's something else I could be doing or if you have better ideas for how to handle this.

@treed
Copy link
Author

treed commented Apr 1, 2015

Here's some output that illustrates what I'm talking about.

% dig kibana-http.service.consul SRV

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> kibana-http.service.consul SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33446
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;kibana-http.service.consul.    IN      SRV

;; ANSWER SECTION:
kibana-http.service.consul. 0   IN      SRV     1 1 5601 core-01.node.dc1.consul.

;; ADDITIONAL SECTION:
core-01.node.dc1.consul. 0      IN      A       10.200.0.26

;; Query time: 1 msec
;; SERVER: 10.200.0.1#53(10.200.0.1)
;; WHEN: Mon Mar 30 17:05:03 PDT 2015
;; MSG SIZE  rcvd: 152

% dig core-01.node.dc1.consul

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> core-01.node.dc1.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58845
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;core-01.node.dc1.consul.       IN      A

;; ANSWER SECTION:
core-01.node.dc1.consul. 0      IN      A       172.17.8.101

;; Query time: 2 msec
;; SERVER: 10.200.0.1#53(10.200.0.1)
;; WHEN: Mon Mar 30 17:05:51 PDT 2015
;; MSG SIZE  rcvd: 80

When I try to use srv-router to get to the kibana-http service, it tries connecting to 172.17.8.101:5601, rather than 10.200.0.26:5601

It seems like the answer shouldn't change depending on how the user asks.

@gbelur
Copy link

gbelur commented Apr 15, 2015

The node name consul returns in the answer section represents the consul agent and not the domain name of the host running the service. Shouldn't that be the case?

@treed
Copy link
Author

treed commented Apr 15, 2015

IMO, it should be the case that it returns a name representing the service itself, which might (and in this case does) have a different IP for various reasons.

In the original answer, it gives this IP for that node's hostname in the additional info section, which is... kinda right:

core-01.node.dc1.consul. 0 IN A 10.200.0.26

But then if anything else asks for that hostname in other contexts, it gets the node's actual IP:

core-01.node.dc1.consul. 0 IN A 172.17.8.101

I think it would be better to have services resolve to a hostname representing the service, which would always resolve to the service's IP.

@armon
Copy link
Member

armon commented May 7, 2015

@treed Sorry about the delay. I agree. I guess for the SRV lookups we should use a special FQDN that is
distinct from the node lookup to avoid the ambiguity. The issue as you said is that the IP address depends on the lookup context (node -> agent IP, service -> service IP).

I'll mark this as a thinking bug.

@armon armon added type/bug Feature does not function as expected thinking More time is needed to research by the Consul Contributors labels May 7, 2015
@prisamuel
Copy link

I'm seeing the exact same behaviour with Consul - also using the above mentioned srv-lb for doing dns lookup.

bash-4.3# dig mt-content-blogs.service.consul. SRV

;; QUESTION SECTION:
;mt-content-blogs.service.consul. IN    SRV

;; ANSWER SECTION:
mt-content-blogs.service.consul. 5 IN   SRV 1 1 9494 consul-server.economist.local.node.dc1.consul.

;; ADDITIONAL SECTION:
consul-server.economist.local.node.dc1.consul. 5 IN A 172.18.0.4

but the subsequent A record lookup returns

bash-4.3# dig consul-server.economist.local.node.dc1.consul. A

;; QUESTION SECTION:
;consul-server.economist.local.node.dc1.consul. IN A

;; ANSWER SECTION:
consul-server.economist.local.node.dc1.consul. 0 IN A 172.18.0.5

which is different from the A record returned in the 'ADDITIONAL SECTION' in the first SRV lookup.

@foxel
Copy link
Contributor

foxel commented Jul 14, 2016

Any update on this? Really disappointing bug.

Why not return service address in SRV answer section?

@weiwei04
Copy link
Contributor

Any update on this?

@piotrkowalczuk
Copy link

This bug makes consul unusable with go's net.LookupSRV.

@perplexes
Copy link

perplexes commented Oct 6, 2016

This also makes consul somewhat unusable with node's native dns.resolveSrv

@slackpad slackpad added the dns label Oct 26, 2016
@slackpad slackpad added this to the 0.7.1 milestone Oct 26, 2016
@slackpad
Copy link
Contributor

A simple fix for this would be something like a new Consul DNS handler for names like 10.200.0.26.ipv4.addr.consul. which returns the corresponding A or AAAA record (for .ipv6.addr.consul.). This would be simple and not need to tie back to the original service in any way.

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
Increase the memory for the client daemonset's tls-init
container. We saw an out of memory error when running on OpenShift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thinking More time is needed to research by the Consul Contributors type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

9 participants