Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric serf.coordinate.adjustment-ms negative #487

Closed
chemicL opened this issue Oct 16, 2017 · 9 comments
Closed

Metric serf.coordinate.adjustment-ms negative #487

chemicL opened this issue Oct 16, 2017 · 9 comments
Assignees
Labels

Comments

@chemicL
Copy link

chemicL commented Oct 16, 2017

The serf.coordinate.adjustment-ms metric can sometimes stay at a negative value of -9223372013568.000000, which results in our statsd logs being flooded by warnings (it says DEBUG, although it's at warning level).

Oct 13 06:25:05 statsd1 nodejs[1049]: 13 Oct 06:25:05 - DEBUG: Bad line: -9223372013568.000000,ms in msg "our.stats.nodeX.serf.coordinate.adjustment-ms:-9223372013568.000000|ms"
Oct 13 06:25:05 statsd1 nodejs[1049]: 13 Oct 06:25:05 - DEBUG: Bad line: -9223372013568.000000,ms in msg "our.stats.nodeY.serf.coordinate.adjustment-ms:-9223372013568.000000|ms"

It seems negative adjustments could be valid, however it's surprising that from different machines it's the same negative value repeated in all the logs.

We're using consul version 0.7.5 - 0.9.3 in our environment and that's where we observed the issue. The metric has been present in serf used by those versions.

@slackpad slackpad added the bug label Oct 16, 2017
@slackpad slackpad self-assigned this Oct 16, 2017
@slackpad
Copy link
Contributor

Hey @chemicL thanks for the report - I don't recognize this as any of the coordinate-related constants so will need to look through the algorithm a bit. When you say "stay" does it get stuck like that essentially forever?

@slackpad
Copy link
Contributor

Linking to hashicorp/consul#3023, which might be related.

@chemicL
Copy link
Author

chemicL commented Oct 16, 2017

I confirm it does "get stuck like that essentially forever", producing gigabytes of statsd warning logs for that single metric per day. All with the same value.

The metric is reported here: https://github.com/hashicorp/serf/blob/master/serf/ping_delegate.go#L78

@slackpad
Copy link
Contributor

@chemicL if you have any nodes in that state can you do a quick check of the v1/agent/self output and see if coordinate_resets is non-zero? It would be interesting to see if that's going up as well.

@chemicL
Copy link
Author

chemicL commented Oct 17, 2017

This happens also in version 0.7.5, where I don't see coordinate_resets in the v1/agent/self output.
In an agent at 0.9.5 which reports the corrupt metric: "coordinate_resets": "0"

@slackpad
Copy link
Contributor

One final thing I just thought of on your 0.9.3 agent in the bad state can you snag the state of Coord inside /v1/agent/self? That might shed some light on what's happening. We don't have plans to release a 0.9.4 but if you don't mind building locally you could delete that log line from the ping delegate as a simple workaround.

@chemicL
Copy link
Author

chemicL commented Oct 18, 2017

Here's the output from one of the problematic agents:

  "Coord": {
    "Vec": [
      39782439707751.25,
      74910554230826.5,
      -3197422081087.8438,
      40265976210804.75,
      26862373250963.875,
      49530119931223.25,
      -21264100820449.5,
      -29394499383369.5
    ],
    "Error": 1.5,
    "Adjustment": -1546716855094442.5,
    "Height": 1e-05
  }

As for the workaround, we'd prefer not to fork another project, but let's see what's possible.

I mentioned on Gitter that the filtering is not applied in consul 0.9.3. Filtering out this metric would allow us to keep on going ;-)

@slackpad
Copy link
Contributor

I mentioned on Gitter that the filtering is not applied in consul 0.9.3. Filtering out this metric would allow us to keep on going ;-)

Indeed, sorry about that. This is fixed in 1.0!

@slackpad
Copy link
Contributor

Thanks for the data - I'll try to see how things got into this state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants