Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most metrics have disappeared after version v0.22.0 #3236

Closed
albertorm95 opened this issue Mar 16, 2023 · 7 comments
Closed

Most metrics have disappeared after version v0.22.0 #3236

albertorm95 opened this issue Mar 16, 2023 · 7 comments
Labels
bug Something isn't working regression Bug introduced in a new version
Milestone

Comments

@albertorm95
Copy link
Contributor

albertorm95 commented Mar 16, 2023

Hello!

When we used v0.21.0 we had theses metrics:
atlantis_cmd_comment_plan_project_execution_success
atlantis_cmd_comment_plan_project_execution_failure

Then we updated to v0.22.3 and stop seeing them, now we just have:
atlantis_cmd_comment_plan_execution_time

Basically only atlantis_cmd_comment_%s_execution_time metrics

Looking at the drift between v0.21.0 and v0.22.3 is kind of rough, have you notice this?

Have you context on what could have happened?

NOTE: we just tested with v0.23.2 and same behaviour

@albertorm95 albertorm95 added the bug Something isn't working label Mar 16, 2023
@nitrocode
Copy link
Member

Thanks for reporting. Could you incrementally go up/down to see when the metrics stopped coming in?

@albertorm95
Copy link
Contributor Author

albertorm95 commented Mar 17, 2023

SeverSide:

repos:
  - id: '/.*/'
    apply_requirements:
      - mergeable
      - approved
    allowed_overrides:
      - apply_requirements
      - workflow
    allow_custom_workflows: false
metrics:
  prometheus:
    endpoint: /metrics

Working in:


Failed on:

This error was fixed here:

{
    "level": "error",
    "ts": "2023-03-17T11:39:25.075+0100",
    "caller": "events/command_runner.go:427",
    "msg": "PANIC: potential tally.Scope() vs Prometheus usage contract mismatch: if this occurs after using Scope.Tagged(), different metric names must be used than were registered with the parent scope: a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"test_project_execution_success\", help: \"test_project_execution_success counter\", constLabels: {}, variableLabels: [workspace base_repo pr_number project project_path terraform_version]} has different label names or a different help string\n/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/prometheus/reporter.go:306 (0x10559579b)\n/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/prometheus/reporter.go:366 (0x10559623f)\n/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/scope.go:280 (0x105492d2f)\n/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_runner.go:58 (0x105938273)\n/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_runner.go:35 (0x105937dbb)\n/home/runner/work/atlantis/atlantis/server/events/project_command_pool_executor.go:48 (0x105948727)\n/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:216 (0x10593d767)\n/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:251 (0x10593de6b)\n/home/runner/work/atlantis/atlantis/server/events/command_runner.go:296 (0x10592a423)\n/opt/hostedtoolcache/go/1.19.4/x64/src/runtime/asm_arm64.s:1172 (0x105020303)\n",
    "json": {
        "repo": "albertorm95/atlantis-test",
        "pull": "7"
    },
    "stacktrace": "github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).logPanics\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:427\nruntime.gopanic\n\t/opt/hostedtoolcache/go/1.19.4/x64/src/runtime/panic.go:890\ngithub.com/uber-go/tally/prometheus.NewReporter.func1\n\t/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/prometheus/reporter.go:306\ngithub.com/uber-go/tally/prometheus.(*reporter).AllocateCounter\n\t/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/prometheus/reporter.go:366\ngithub.com/uber-go/tally.(*scope).Counter\n\t/home/runner/go/pkg/mod/github.com/uber-go/tally@v3.5.0+incompatible/scope.go:280\ngithub.com/runatlantis/atlantis/server/events.RunAndEmitStats\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_runner.go:58\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).Plan\n\t/home/runner/work/atlantis/atlantis/server/events/instrumented_project_command_runner.go:35\ngithub.com/runatlantis/atlantis/server/events.runProjectCmds\n\t/home/runner/work/atlantis/atlantis/server/events/project_command_pool_executor.go:48\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:216\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:251\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:296"
}

Failed on:

Only:
test_cmd_comment_plan_execution_time


So more likely to be the addition of init metrics:

cc: @Fabianoshz

@nitrocode nitrocode added the regression Bug introduced in a new version label Mar 17, 2023
@nitrocode nitrocode changed the title Some metrics have disappeared between version v0.21.0 and v0.22.3 Most metrics have disappeared after version v0.22.0 Mar 17, 2023
@nitrocode
Copy link
Member

Do you think this is something @albertorm95 or @Fabianoshz can fix or should we revert the prs #2767 and #2847 ?

@albertorm95
Copy link
Contributor Author

What do you think @nitrocode? I think having metrics is more useful than having non-init metrics but thats from my perspective. I really would like to fix the thing, but I really don't have time now for looking into it.

cc: @Fabianoshz @tchelovilar

@ronaldour
Copy link

Also not sure if related but for me all the metrics of type summary that have the quantile show up as NaN

# HELP atlantis_cmd_comment_plan_execution_time atlantis_cmd_comment_plan_execution_time summary
# TYPE atlantis_cmd_comment_plan_execution_time summary
atlantis_cmd_comment_plan_execution_time{quantile="0.5"} NaN
atlantis_cmd_comment_plan_execution_time{quantile="0.75"} NaN
atlantis_cmd_comment_plan_execution_time{quantile="0.95"} NaN
atlantis_cmd_comment_plan_execution_time{quantile="0.99"} NaN
atlantis_cmd_comment_plan_execution_time{quantile="0.999"} NaN
atlantis_cmd_comment_plan_execution_time_sum 666.772606905
atlantis_cmd_comment_plan_execution_time_count 12

I'm running the latest version 0.23.5

@albertorm95
Copy link
Contributor Author

albertorm95 commented May 17, 2023

I'll try to fix this.

edit 1:
This is the PR that makes it fail:

a96a88e

edit 2:
wip

edit 3:
I'm not go expert but this is what I managed to do:
the previous metrics:
atlantis_cmd_comment_%COMMAND_project_execution_success - > atlantis_project_%COMMAND_execution_success

# HELP atlantis_cmd_comment_plan_execution_time atlantis_cmd_comment_plan_execution_time summary
# TYPE atlantis_cmd_comment_plan_execution_time summary
atlantis_cmd_comment_plan_execution_time{quantile="0.5"} 4.06519725
atlantis_cmd_comment_plan_execution_time{quantile="0.75"} 4.06519725
atlantis_cmd_comment_plan_execution_time{quantile="0.95"} 4.06519725
atlantis_cmd_comment_plan_execution_time{quantile="0.99"} 4.06519725
atlantis_cmd_comment_plan_execution_time{quantile="0.999"} 4.06519725
atlantis_cmd_comment_plan_execution_time_sum 4.06519725
atlantis_cmd_comment_plan_execution_time_count 1

# HELP atlantis_project_apply_execution_error atlantis_project_apply_execution_error counter
# TYPE atlantis_project_apply_execution_error counter
atlantis_project_apply_execution_error{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 0

# HELP atlantis_project_apply_execution_failure atlantis_project_apply_execution_failure counter
# TYPE atlantis_project_apply_execution_failure counter
atlantis_project_apply_execution_failure{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 0

# HELP atlantis_project_apply_execution_success atlantis_project_apply_execution_success counter
# TYPE atlantis_project_apply_execution_success counter
atlantis_project_apply_execution_success{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 1

# HELP atlantis_project_apply_execution_time atlantis_project_apply_execution_time summary
# TYPE atlantis_project_apply_execution_time summary
atlantis_project_apply_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.5"} 0.936589625
atlantis_project_apply_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.75"} 0.936589625
atlantis_project_apply_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.95"} 0.936589625
atlantis_project_apply_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.99"} 0.936589625
atlantis_project_apply_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.999"} 0.936589625
atlantis_project_apply_execution_time_sum{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 0.936589625
atlantis_project_apply_execution_time_count{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 1

# HELP atlantis_project_plan_execution_error atlantis_project_plan_execution_error counter
# TYPE atlantis_project_plan_execution_error counter
atlantis_project_plan_execution_error{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 0

# HELP atlantis_project_plan_execution_failure atlantis_project_plan_execution_failure counter
# TYPE atlantis_project_plan_execution_failure counter
atlantis_project_plan_execution_failure{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 0

# HELP atlantis_project_plan_execution_success atlantis_project_plan_execution_success counter
# TYPE atlantis_project_plan_execution_success counter
atlantis_project_plan_execution_success{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 1

# HELP atlantis_project_plan_execution_time atlantis_project_plan_execution_time summary
# TYPE atlantis_project_plan_execution_time summary
atlantis_project_plan_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.5"} 1.26189025
atlantis_project_plan_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.75"} 1.26189025
atlantis_project_plan_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.95"} 1.26189025
atlantis_project_plan_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.99"} 1.26189025
atlantis_project_plan_execution_time{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default",quantile="0.999"} 1.26189025
atlantis_project_plan_execution_time_sum{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 1.26189025
atlantis_project_plan_execution_time_count{base_repo="foouser/foo-repo",pr_number="7",project="atlantis-example-live",project_path="example",terraform_version="1.3.4",workspace="default"} 1

@albertorm95
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression Bug introduced in a new version
Projects
None yet
Development

No branches or pull requests

3 participants