Large number of warning logs with nomad service provider #16238

chenjpu · 2023-02-22T03:44:14Z

Nomad version

1.4.4

Issue

logs( journalctl -fu nomad )

 
Feb 22 11:19:06 ai-149 nomad[10066]: 2023-02-22T11:19:06.652+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:07 ai-149 nomad[10066]: 2023-02-22T11:19:07.653+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:08 ai-149 nomad[10066]: 2023-02-22T11:19:08.653+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:09 ai-149 nomad[10066]: 2023-02-22T11:19:09.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:10 ai-149 nomad[10066]: 2023-02-22T11:19:10.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:11 ai-149 nomad[10066]: 2023-02-22T11:19:11.654+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:12 ai-149 nomad[10066]: 2023-02-22T11:19:12.655+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:13 ai-149 nomad[10066]: 2023-02-22T11:19:13.655+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:14 ai-149 nomad[10066]: 2023-02-22T11:19:14.656+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:15 ai-149 nomad[10066]: 2023-02-22T11:19:15.656+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:16 ai-149 nomad[10066]: 2023-02-22T11:19:16.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:17 ai-149 nomad[10066]: 2023-02-22T11:19:17.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:18 ai-149 nomad[10066]: 2023-02-22T11:19:18.657+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:19 ai-149 nomad[10066]: 2023-02-22T11:19:19.658+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:20 ai-149 nomad[10066]: 2023-02-22T11:19:20.658+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:21 ai-149 nomad[10066]: 2023-02-22T11:19:21.659+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:22 ai-149 nomad[10066]: 2023-02-22T11:19:22.659+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:23 ai-149 nomad[10066]: 2023-02-22T11:19:23.660+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:24 ai-149 nomad[10066]: 2023-02-22T11:19:24.660+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:25 ai-149 nomad[10066]: 2023-02-22T11:19:25.661+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:26 ai-149 nomad[10066]: 2023-02-22T11:19:26.661+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:27 ai-149 nomad[10066]: 2023-02-22T11:19:27.662+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:28 ai-149 nomad[10066]: 2023-02-22T11:19:28.662+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:29 ai-149 nomad[10066]: 2023-02-22T11:19:29.663+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:30 ai-149 nomad[10066]: 2023-02-22T11:19:30.663+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:31 ai-149 nomad[10066]: 2023-02-22T11:19:31.664+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:32 ai-149 nomad[10066]: 2023-02-22T11:19:32.664+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:33 ai-149 nomad[10066]: 2023-02-22T11:19:33.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:34 ai-149 nomad[10066]: 2023-02-22T11:19:34.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:35 ai-149 nomad[10066]: 2023-02-22T11:19:35.665+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:36 ai-149 nomad[10066]: 2023-02-22T11:19:36.666+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:37 ai-149 nomad[10066]: 2023-02-22T11:19:37.667+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:38 ai-149 nomad[10066]: 2023-02-22T11:19:38.667+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:39 ai-149 nomad[10066]: 2023-02-22T11:19:39.668+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:40 ai-149 nomad[10066]: 2023-02-22T11:19:40.668+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:41 ai-149 nomad[10066]: 2023-02-22T11:19:41.669+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:42 ai-149 nomad[10066]: 2023-02-22T11:19:42.669+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:43 ai-149 nomad[10066]: 2023-02-22T11:19:43.670+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:44 ai-149 nomad[10066]: 2023-02-22T11:19:44.670+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:45 ai-149 nomad[10066]: 2023-02-22T11:19:45.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:46 ai-149 nomad[10066]: 2023-02-22T11:19:46.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:47 ai-149 nomad[10066]: 2023-02-22T11:19:47.671+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:48 ai-149 nomad[10066]: 2023-02-22T11:19:48.672+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:49 ai-149 nomad[10066]: 2023-02-22T11:19:49.672+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:50 ai-149 nomad[10066]: 2023-02-22T11:19:50.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:51 ai-149 nomad[10066]: 2023-02-22T11:19:51.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:52 ai-149 nomad[10066]: 2023-02-22T11:19:52.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:53 ai-149 nomad[10066]: 2023-02-22T11:19:53.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:54 ai-149 nomad[10066]: 2023-02-22T11:19:54.673+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:55 ai-149 nomad[10066]: 2023-02-22T11:19:55.674+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:56 ai-149 nomad[10066]: 2023-02-22T11:19:56.674+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:57 ai-149 nomad[10066]: 2023-02-22T11:19:57.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:58 ai-149 nomad[10066]: 2023-02-22T11:19:58.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:19:59 ai-149 nomad[10066]: 2023-02-22T11:19:59.675+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:00 ai-149 nomad[10066]: 2023-02-22T11:20:00.676+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:01 ai-149 nomad[10066]: 2023-02-22T11:20:01.676+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:02 ai-149 nomad[10066]: 2023-02-22T11:20:02.677+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:03 ai-149 nomad[10066]: 2023-02-22T11:20:03.677+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:04 ai-149 nomad[10066]: 2023-02-22T11:20:04.678+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:05 ai-149 nomad[10066]: 2023-02-22T11:20:05.678+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:06 ai-149 nomad[10066]: 2023-02-22T11:20:06.679+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:07 ai-149 nomad[10066]: 2023-02-22T11:20:07.680+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:08 ai-149 nomad[10066]: 2023-02-22T11:20:08.680+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:09 ai-149 nomad[10066]: 2023-02-22T11:20:09.681+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:10 ai-149 nomad[10066]: 2023-02-22T11:20:10.681+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:11 ai-149 nomad[10066]: 2023-02-22T11:20:11.682+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1
Feb 22 11:20:12 ai-149 nomad[10066]: 2023-02-22T11:20:12.682+0800 [WARN]  watch.checks: watched check not found: check_id=90d04bc04f044aba3fe01081bff55de1

job.hcl

job "demo" {
  datacenters = ["dc1"]
  type        = "service"


  update {
    max_parallel     = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    auto_revert      = true
    auto_promote     = true
    canary           = 1
  }

  reschedule {
    attempts       = 15
    interval       = "1h"
    delay          = "15s"
    delay_function = "exponential"
    max_delay      = "120s"
    unlimited      = false
  }

  group "service" {

    scaling {
      enabled = true
      min     = 1
      max     = 3
    }

    restart {
      interval = "3m"
      attempts = 3
      delay    = "15s"
      mode     = "delay"
    }

    network {
      port "http" { to = 8080 }
    }

    service {
      name         = "test"
      port         = "http"
      address_mode = "host"
      tags = [ ]
      provider     = "nomad"
      check {
        type           = "http"
        port           = "http"
        path           = "/"
        interval       = "12s"
        timeout        = "6s"

        check_restart {
          limit           = 3
          grace           = "10s"
        }
      }
    }

    task "app" {
      driver = "docker"
      config {
        image   = "...."
        command = "python"
        ports   = ["http"]
        args = [

        ]
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

shoenig · 2023-02-22T15:38:45Z

Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the service belonging to a task rather than the group - any chance your job was actually configured that way too?

Fix in #16240

chenjpu · 2023-03-01T03:02:44Z

Thanks for reporting @chenjpu! Indeed I was able to reproduce this, but with the service belonging to a task rather than the group - any chance your job was actually configured that way too?

Fix in #16240
Yes, my service is configured on the goup

chenjpu · 2023-03-01T03:04:35Z

@shoenig
When I upgraded to 1.5.0-rc1, the above problems still remained

shoenig · 2023-03-01T14:17:00Z

Whelp I guess there was more than one bug then 😬

I'll try and reproduce again with a job structured exactly like yours @chenjpu

This PR fixes a bug where issuing a restart to a terminal allocation would cause the allocation to run its hooks anyway. This was particularly apparent with group_service_hook who would then register services but then never deregister them - as the allocation would be effectively in a "zombie" state where it is prepped to run tasks but never will. Fixes #17079 Fixes #16238 Fixes #14618

seanamos · 2023-09-06T15:08:01Z

I am experiencing a tsunami of these logs with Nomad 1.6.1. It's also generating a large amount of request error logs in Consul.

Example consul log:

/v1/agent/check/register ServiceID "_nomad-task-9ec1786d-e9b1-db33-a381-c748bf033163-group-xxx-xxx-xxx-xxx-xxx-xxx-80" does not exist

This seems to have started with a task that failed to start due to a missing vault secret. It's now generating hundreds of thousands of logs between nomad and consul.

The job itself has two groups, each with different vault policies. Each group contains 1 service, 1 task. Only the second group has a service check, it is this check that is causing all the errors.

The issue causing the task to fail has been resolved, however, the log tsunami persists.

In the short term, is there a workaround to get the logs to stop, short of restoring to an older snapshot?

seanamos · 2023-09-06T16:04:15Z

For others running into this and need a quick way to stop the influx of logs, restart the nomad client.

For us, we ran an instance refresh across the affected autoscaling group.

lattwood · 2024-06-25T14:52:01Z

We're also running into this.

chenjpu added the type/bug label Feb 22, 2023

chenjpu changed the title ~~Large number of warning logs with nomad service prrovider~~ Large number of warning logs with nomad service provider Feb 22, 2023

shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/service-discovery/nomad labels Feb 22, 2023

shoenig self-assigned this Feb 22, 2023

shoenig mentioned this issue Feb 22, 2023

services: ensure task group is set on service hook #16240

Merged

shoenig closed this as completed in #16240 Feb 22, 2023

hc-github-team-nomad-core mentioned this issue Feb 22, 2023

Backport of services: ensure task group is set on service hook into release/1.4.x #16241

Merged

shoenig reopened this Mar 1, 2023

This was referenced Mar 1, 2023

nsd: updating a job with service in nomad provider causes agent error logs #16286

Open

scheduler: overhaul util.go/tasksUpdated #16290

Closed

shoenig mentioned this issue May 12, 2023

Services not unregistered from Consul #17079

Closed

shoenig mentioned this issue May 12, 2023

client: ignore restart issued to terminal allocations #17175

Merged

tgross added the hcc/jira label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of warning logs with nomad service provider #16238

Large number of warning logs with nomad service provider #16238

chenjpu commented Feb 22, 2023

shoenig commented Feb 22, 2023

chenjpu commented Mar 1, 2023

chenjpu commented Mar 1, 2023

shoenig commented Mar 1, 2023

seanamos commented Sep 6, 2023

seanamos commented Sep 6, 2023

lattwood commented Jun 25, 2024

Large number of warning logs with nomad service provider #16238

Large number of warning logs with nomad service provider #16238

Comments

chenjpu commented Feb 22, 2023

Nomad version

Issue

logs( journalctl -fu nomad )

job.hcl

shoenig commented Feb 22, 2023

chenjpu commented Mar 1, 2023

chenjpu commented Mar 1, 2023

shoenig commented Mar 1, 2023

seanamos commented Sep 6, 2023

seanamos commented Sep 6, 2023

lattwood commented Jun 25, 2024