Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing Not Implemented error with Canary and MetricTemplate #1669

Open
joedborg opened this issue Jun 27, 2024 · 1 comment
Open

Seeing Not Implemented error with Canary and MetricTemplate #1669

joedborg opened this issue Jun 27, 2024 · 1 comment

Comments

@joedborg
Copy link

Describe the bug

I'm getting this error as a new image is being rolled out:

{"level":"error","ts":"2024-06-27T15:12:22.835Z","caller":"controller/events.go:39","msg":"Metric query failed for consumer-lag: error response: {\"code\":5,\"message\":\"Not Implemented (category=INVALID_REQUEST_ERROR code=NOT_FOUND)\",\"details\":[{\"type_url\":\"type.googleapis.com/apierrors.Error\",\"value\":\"CAIQoNQYGg9Ob3QgSW1wbGVtZW50ZWQ=\"}]}","canary":"my-canary.my-ns","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runMetricChecks\n\t/workspace/pkg/controller/scheduler_metrics.go:285\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis\n\t/workspace/pkg/controller/scheduler.go:753\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:442\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}

With

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-canary
spec:
  provider: kubernetes
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  progressDeadlineSeconds: 60
  service:
    port: 8080
  analysis:
    interval: 30s
    iterations: 10
    threshold: 2
    metrics:
    - name: consumer-lag
      templateRef:
        name: my-deployment-lag
      thresholdRange:
        max: 1500
      interval: 30m
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: my-deployment-lag
spec:
  provider:
    type: prometheus
    address: https://myorg.chronosphere.io:443
    secretRef:
      name: chronosphere
  query: |
    sum by (
      kafka_id, topic, consumer_group_id
    ) (
      confluent_kafka_server_consumer_lag_offsets{
        job="my-job",
        cluster="my-cluster",
        consumer_group_id="my-consumer-group"
      }
    )

Which results in

NAME                             STATUS        WEIGHT   LASTTRANSITIONTIME
my-canary                      Failed            0              2024-06-27T15:13:22Z

My first guess would be that Chronosphere's API isn't exactly the same as Prometheus', but I'm not sure.

To Reproduce

Use manifests above and attempt a rollout.

Expected behavior

I expect to not get this error and canary promotion to succeed.

Additional context

  • Flagger version: 1.37.0
  • Kubernetes version: 1.27.13
  • Service Mesh provider: Istio v1.17.1
  • Ingress provider: Istio v1.17.1
@joedborg
Copy link
Author

Tried to address this in #1670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant