Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1/catalog/service/:name?tag=xxx incorrectly filters results with <=1.2.3 clients and >=1.3.0 server #4922

Closed
agy opened this issue Nov 7, 2018 · 6 comments
Labels
type/bug Feature does not function as expected
Milestone

Comments

@agy
Copy link
Contributor

agy commented Nov 7, 2018

Overview of the Issue

Following the suggested guidelines for upgrading a Consul cluster, I first upgraded my servers to 1.3.0 before starting on the clients.

After upgrading my servers to Consul 1.3.0 I noticed that client queries to v1/catalog/service:name?tag=xxx started returning all nodes with the configured service and not only the nodes running the service with the specified tag.

All the client nodes are running Consul 1.2.3 and have the service configured using a local service definition file.

Querying the similar health endpoint filters the results as expected.

Once the clients are upgraded to 1.3.0, this issue goes away on those clients. Clients remaining on 1.2.3 still have the issue.

What I see:

user@client:~$ curl localhost:8500/v1/catalog/service/foo | \
    jq .[].ID | wc -l
2
user@client:~$ curl localhost:8500/v1/catalog/service/foo?tag=notexist | \
    jq .[].ID | wc -l
2

What I expect:

user@client:~$ curl localhost:8500/v1/catalog/service/foo | \
    jq .[].ID | wc -l
2
user@client:~$ curl localhost:8500/v1/catalog/service/foo?tag=notexist | \
    jq .[].ID | wc -l
0

This issue goes away after upgrading the clients to >= 1.3.0

Reproduction Steps

Steps to reproduce this issue:

  1. Create a cluster with 2 client nodes running 1.2.3 and 1 server nodes running 1.3.0
  2. Register a common service (e.g. foo) on the clients using a service definition file
  3. Run curl localhost:8500/v1/catalog/service/foo?tag=notexist | jq .[].ID | wc -l
  4. View that the returned node count is 2 and not 0

Example service definition file:

{
  "service": {
    "name": "foo",
    "tags": [
      "bar",
      "baz"
    ]
  }
}

Consul info for both Client and Server

Note: This is on a test setup in order to reproduce

Client info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 1
build:
	prerelease =
	revision = 48d287ef
	version = 1.2.3
consul:
	known_servers = 1
	server = false
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 45
	max_procs = 1
	os = linux
	version = go1.10.1
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 9
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 32
	members = 2
	query_queue = 0
	query_time = 1
Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 1
build:
	prerelease = rc1
	revision = 1757fbc0
	version = 1.4.0
consul:
	bootstrap = true
	known_datacenters = 1
	leader = true
	leader_addr = 172.29.232.143:8300
	server = true
raft:
	applied_index = 640
	commit_index = 640
	fsm_pending = 0
	last_contact = 0
	last_log_index = 640
	last_log_term = 5
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:7efa4e4d-0e7a-a48b-ce83-da04d5eb521c Address:172.29.232.143:8300}]
	latest_configuration_index = 1
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 5
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 88
	max_procs = 1
	os = linux
	version = go1.11.1
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 9
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 32
	members = 2
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

OS: Ubuntu 18.04 (and 16.04)
Architecture: amd64
Environment: AWS EC2

Log Fragments

Included logs for completeness, but there isn't anything useful:

Client logs (at TRACE level):

    2018/11/07 20:34:42 [DEBUG] http: Request GET /v1/catalog/service/datadog (1.31979ms) from=127.0.0.1:35312
    2018/11/07 20:34:44 [DEBUG] http: Request GET /v1/catalog/service/datadog?tag=notexist (1.287669ms) from=127.0.0.1:35314
@ShimmerGlass
Copy link
Contributor

@agy I've reproduced the issue and made a fix in #4944

@pierresouchay
Copy link
Contributor

Reproduced on our side: block any migration from versions prior to 1.3.0 to versions greater than 1.3.0+

@banks @mkeeler @pearkes Can you have a look to the fix provided by @Aestek in #4944 ? it blocks us from migrating to new Consul versions

@banks banks added the type/bug Feature does not function as expected label Nov 12, 2018
@banks banks added this to the 1.4.0 milestone Nov 12, 2018
@banks
Copy link
Member

banks commented Nov 12, 2018

@agy thanks for the report! This does seem to be a bug in the changes to tags - we missed the migration case. Thanks to @Aestek's PR we can hopefully get a fix into the 1.4.0 final release (due very soon) we'll also consider a backport of the patch to 1.3.x

@pvandervelde
Copy link
Contributor

@banks Did this make it in to the 1.4.0 release? I couldn't find anything in the release notes. I'm having (potentially) the same issue but in my case I have a server on 1.1.0 and a client on 1.4.0. When I query

http://localhost:8500/v1/catalog/service/myservice?tag=mytag

on the client I get no services back. If I do the same query on a client running 1.2.2 I get the expected response of 1 service.

@pierresouchay
Copy link
Contributor

@pvandervelde increasing versions number should always be performed by upgrading servers first, otherwise ascending compatibility cannot be achieved.
Otherwise yes, this patch is included in 1.3.1 and in 1.4.x, see https://github.com/hashicorp/consul/commits/v1.4.0

@pvandervelde
Copy link
Contributor

@pierresouchay Oh crap I didn't know that. Thanks for that information. I shall make sure to actually read the documentation next time. I have downgraded the clients, so I will upgrade the servers first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

5 participants