Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad renders incorrect service address and ports in template. #18203

Closed
blmhemu opened this issue Aug 15, 2023 · 4 comments
Closed

Nomad renders incorrect service address and ports in template. #18203

blmhemu opened this issue Aug 15, 2023 · 4 comments
Labels

Comments

@blmhemu
Copy link

blmhemu commented Aug 15, 2023

Nomad version

Output from nomad version
1.6.1

Operating system and Environment details

Fedora 38

Issue

Here is the template in the nomad job

reverse_proxy {{ range nomadService "svc" }}{{ .Address }}:{{ .Port }} {{ end }}{
  fail_duration 30s
}

here is what is rendered

reverse_proxy 10.1.1.3:29154 10.1.1.3:30574 10.1.1.3:30306 {
   fail_duration 30s
}

But the thing is only one of the above ip:port is valid. There is only one service (count==1) and even nomad shows only one allocation. This leads to the situation where my reverse proxy fails to get upstream 66% of the time.

This happens not just for one service, but for multiple services. I had to manually delete (cleanup) the services from nomad to fix this.

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@jrasell
Copy link
Member

jrasell commented Aug 15, 2023

Hi @blmhemu, what does the CLI output of the nomad service info some-service show? The template process and CLI both use the same API call, so I would expect these to match.

@blmhemu
Copy link
Author

blmhemu commented Aug 15, 2023

Hi @jrasell ! Yup it does match - The service info shows three IDs

ID           = _nomad-task-68a39c56-2cc8-c4b4-4098-2311430a3bc0-group-svc
Service Name = svc
Namespace    = default
Job ID       = job-id
Alloc ID     = 68a39c56-2cc8-c4b4-4098-2311430a3bc0
Node ID      = 6148e6dd-15e0-f4a3-e247-f2cddb64c04f
Datacenter   = dc1
Address      = 10.1.1.3:29154
Tags         = []

ID           = _nomad-task-8ad05f3c-0eab-9519-d1b5-acdfb881e90c-group-svc
Service Name = svc
Namespace    = default
Job ID       = job-id
Alloc ID     = 8ad05f3c-0eab-9519-d1b5-acdfb881e90c
Node ID      = 6148e6dd-15e0-f4a3-e247-f2cddb64c04f
Datacenter   = dc1
Address      = 10.1.1.3:30574
Tags         = []

ID           = _nomad-task-a7f70d3c-b807-f56d-305e-01a45cea9839-group-svc
Service Name =svc
Namespace    = default
Job ID       = job-id
Alloc ID     = a7f70d3c-b807-f56d-305e-01a45cea9839
Node ID      = 6148e6dd-15e0-f4a3-e247-f2cddb64c04f
Datacenter   = dc1
Address      = 10.1.1.3:30306
Tags         = []

@blmhemu
Copy link
Author

blmhemu commented Aug 16, 2023

FWIW, I tried nomad system gc && nomad system reconcile summaries, but it did not help.

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 16, 2023

#16616 is a longer issue about services failing to be removed, which smells a lot like the problem here. Others in that thread have also reported the problem happening with Nomad service discovery so I will update the title.

@blmhemu I will close this one as duplicate to focus the discussion in a single place. Feel free to 👍 and add any additional comments there.

Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants