Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive Canary with Istio uses default URL to Prometheus #1671

Open
joedborg opened this issue Jun 28, 2024 · 1 comment
Open

Progressive Canary with Istio uses default URL to Prometheus #1671

joedborg opened this issue Jun 28, 2024 · 1 comment

Comments

@joedborg
Copy link

joedborg commented Jun 28, 2024

Describe the bug

When defining a Canary with Istio, Flagger appears to attempt to use a default Prometheus address.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-canary
  namespace: api
spec:
  analysis:
    canaryReadyThreshold: 100
    interval: 10m
    maxWeight: 100
    metrics:
    - interval: 1m
      name: request-success-rate
      thresholdRange:
        min: 100
    - interval: 1m
      name: request-duration
      thresholdRange:
        max: 500
    - interval: 5m
      name: kafka-tx
      templateRef:
        name: kafka-tx-bytes
      thresholdRange:
        min: 100
    - interval: 5m
      name: kafka-rx
      templateRef:
        name: kafka-rx-bytes
      thresholdRange:
        min: 100
    primaryReadyThreshold: 100
    stepWeight: 10
    threshold: 5
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: api-hpa
  progressDeadlineSeconds: 900
  service:
    gateways:
    - mesh-ingress-gateway.istio-system.svc.cluster.local
    hosts:
    - myaddress.io
    port: 8080
    portDiscovery: true
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: gateway-error,connect-failure,refused-stream
    targetPort: 8080
    trafficPolicy:
      tls:
        mode: DISABLE
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api

The custom MetricTemplates can have an endpoint defined, but the default Istio metrics lookups seem to only attempt the default Prometheus address, leading to these errors:

{"level":"error","ts":"2024-06-27T22:45:48.929Z","caller":"controller/events.go:39","msg":"Prometheus query failed: running query failed: request failed: Get \"http://prometheus:9090/api/v1/query?query=+sum%28+rate%28+istio_requests_total%7B+reporter%3D%22destination%22%2C+destination_workload_namespace%3D%22api%22%2C+destination_workload%3D~%22api%22%2C+response_code%21~%225.%2A%22+%7D%5B1m%5D+%29+%29+%2F+sum%28+rate%28+istio_requests_total%7B+reporter%3D%22destination%22%2C+destination_workload_namespace%3D%22api%22%2C+destination_workload%3D~%22api%22+%7D%5B1m%5D+%29+%29+%2A+100\": dial tcp: lookup prometheus on 10.0.0.10:53: no such host","canary":"api-canary.api","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runBuiltinMetricChecks\n\t/workspace/pkg/controller/scheduler_metrics.go:145\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis\n\t/workspace/pkg/controller/scheduler.go:748\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:442\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}

Is it possible to set a custom endpoint with the builtin Istio metrics, or will I have to define all of these myself? It doesn't seem that I can add the provider block to Canary spec.

To Reproduce

Deploy an Istio backed Canary with an external Prometheus endpoint.

Expected behavior

An endpoint on the CRD to define a custom Prometheus endpoint for Istio.

Additional context

  • Flagger version: 1.37.0
  • Kubernetes version: 1.27.13
  • Service Mesh provider: Istio v1.17.1
  • Ingress provider: Istio v1.17.1
@joedborg
Copy link
Author

joedborg commented Jun 28, 2024

Digging into the source, I can see that I might be able to specify this here

https://github.com/fluxcd/flagger/blob/main/cmd/flagger/main.go#L96

...via setting the argument on the Deployment, but it seems that I cannot pass a secrets ref

https://github.com/fluxcd/flagger/blob/main/pkg/metrics/observers/factory.go#L34

Meaning I cannot reach out to an external provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant