Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic, slice index out of bounds #20

Open
xocasdashdash opened this issue Jun 10, 2021 · 9 comments
Open

Panic, slice index out of bounds #20

xocasdashdash opened this issue Jun 10, 2021 · 9 comments
Assignees

Comments

@xocasdashdash
Copy link

I just got this panic:

panic: runtime error: slice bounds out of range [:243] with capacity 242

goroutine 1 [running]:
github.com/cloudflare/pint/internal/reporter.ConsoleReporter.Submit(0xbb2440, 0xc00011e010, 0xc0012e8000, 0xd3, 0x143, 0xbbddd0, 0xc00026cab0, 0x2, 0x2)
	/home/joaquin/projects/personal/github/pint/internal/reporter/console.go:71 +0x1073
main.actionLint(0xc0001a7740, 0x2, 0x2)
	/home/joaquin/projects/personal/github/pint/cmd/pint/lint.go:47 +0x56a
github.com/urfave/cli/v2.(*Command).Run(0xc00016d440, 0xc0001a7600, 0x0, 0x0)
	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:163 +0x4dd
github.com/urfave/cli/v2.(*App).RunContext(0xc0002321a0, 0xbbd5f0, 0xc00011a010, 0xc000124000, 0x5, 0x5, 0x0, 0x0)
	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:313 +0x810
github.com/urfave/cli/v2.(*App).Run(...)
	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224
main.main()
	/home/joaquin/projects/personal/github/pint/cmd/pint/main.go:72 +0x106

on this line:

for _, c := range strings.Split(content, "\n")[firstLine-1 : lastLine] {

Seems like the check can fail sometimes

@prymitive
Copy link
Collaborator

Would you mind sharing a rule file it fails on?

@prymitive prymitive self-assigned this Jun 10, 2021
@xocasdashdash
Copy link
Author

I had to bisect all the rules (hard to know which one it's reporting on) but this one makes it fail consistently. Please note theres an empty line at the end

groups:
  - name: "haproxy.anomaly_detection"
    rules:
      - record: haproxy:healthcheck_failure:rate5m
        expr: |
          sum without(instance, namespace,job, service, endpoint)
          (rate(haproxy_server_check_failures_total[5m]))
      - record: haproxy:healthcheck_failure:rate5m:avg_over_time_1w
        expr: avg_over_time(haproxy:healthcheck_failure:rate5m[1w])
      - record: haproxy:healthcheck_failure:rate5m:stddev_over_time_1w
        expr: stddev_over_time(haproxy:healthcheck_failure:rate5m[1w])
  - name: "haproxy.api_server.rules"
    rules:
      - alert: HaproxyHealtCheckAnomaly
        expr: |
          abs((
          haproxy:healthcheck_failure:rate5m-
          haproxy:healthcheck_failure:rate5m:avg_over_time_1w
          ) / haproxy:healthcheck_failure:rate5m:stddev_over_time_1w) > 3
        for: 10m
        labels: 
          severity: debug
          kind: "K8sApi"
        annotations: 
          summary: "HAproxy is detecting more failures than usual on its health checks"
          description: |
            This value represents the absolute z-score. Here https://about.gitlab.com/blog/2019/07/23/anomaly-detection-using-prometheus/ you
            can read more about how we're using it
          runbook_url: "Check that HAProxy is communicating with the k8s server nodes"
      - alert: HaproxyApiMasterDown
        expr: haproxy_up{server=~".*master.*"} == 0
        for: 15m
        labels:
          severity: 10x5
          node: "{{ $labels.instance }}"
          kind: K8sApiMaster
        annotations:
          summary: "HAProxy master is down (instance {{ $labels.instance }})"
          description: "HAProxy down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyApiMasterDown
        expr: haproxy_up{server=~".*master.*"} == 0
        for: 24h
        labels:
          severity: 10x5
          node: "{{ $labels.instance }}"
          kind: K8sApiMaster
        annotations:
          summary: "HAProxy master is down (instance {{ $labels.instance }})"
          description: "HAProxy down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyApiMasterDown
        expr: count(haproxy_server_up{server=~".*master.*"}==0) by(instance,backend) > 1
        for: 5m
        labels:
          severity: 24x7
          node: "{{ $labels.instance }}"
          kind: K8sApiMaster
        annotations:
          summary: "Multiple K8s master nodes are down (instance {{ $labels.instance }})"
          description: "HAProxy down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyApiMasterDown
        expr: count(haproxy_server_up{server=~".*master.*"}==0) by(instance,backend) > 2
        for: 2m
        labels:
          severity: 24x7
          node: "{{ $labels.instance }}"
          kind: K8sApiMaster
          inhibits: K8sApiInfra
        annotations:
          summary: "All K8s master nodes are down (instance {{ $labels.instance }})"
          description: "HAProxy down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyApiInfraDown
        expr: haproxy_up{server=~".*infra.*"} == 0
        for: 15m
        labels:
          severity: 10x5
          node: "{{ $labels.instance }}"
          kind: K8sApiInfra
        annotations:
          summary: "HAProxy infra is down (server {{ $labels.server }})"
          description: "HAProxy down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyHighHttp4xxErrorRateBackend
        expr: |
          sum by (backend) (rate(haproxy_server_http_responses_total{code="4xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) * 100 > 1
        for: 15m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy high HTTP 4xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 4xx (> 1%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyHighHttp4xxErrorRateBackend
        expr: |
          sum by (backend) (rate(haproxy_server_http_responses_total{code="4xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) * 100 > 3
        for: 15m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy high HTTP 4xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 4xx (> 3%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyHighHttp4xxErrorRateBackend
        expr: |
          sum by (backend) (rate(haproxy_server_http_responses_total{code="4xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) * 100 > 5
        for: 15m
        labels:
          severity: 24x7
        annotations:
          summary: "HAProxy high HTTP 4xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 4xx (> 5%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyHighHttp5xxErrorRateBackend
        expr: sum by (backend) (rate(haproxy_server_http_responses_total{code="5xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) *100 > 1
        for: 15m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy high HTTP 5xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 5xx (> 5%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyHighHttp5xxErrorRateBackend
        expr: sum by (backend) (rate(haproxy_server_http_responses_total{code="5xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) *100 > 3
        for: 15m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy high HTTP 5xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 5xx (> 5%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyHighHttp5xxErrorRateBackend
        expr: sum by (backend) (rate(haproxy_server_http_responses_total{code="5xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) *100 > 5
        for: 15m
        labels:
          severity: 24x7
        annotations:
          summary: "HAProxy high HTTP 5xx error rate backend (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 5xx (> 5%) on backend {{ $labels.backend }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyHighHttp4xxErrorRateServer
        expr: sum by (server) (rate(haproxy_server_http_responses_total{code="4xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) * 100 > 5
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy high HTTP 4xx error rate server (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 4xx (> 5%) on server {{ $labels.server }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyHighHttp5xxErrorRateServer
        expr: sum by (server) (rate(haproxy_server_http_responses_total{code="5xx"}[1m])) / sum by (backend) (rate(haproxy_server_http_responses_total{}[1m])) * 100 > 5
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy high HTTP 5xx error rate server (instance {{ $labels.instance }})"
          description: "Too many HTTP requests with status 5xx (> 5%) on server {{ $labels.server }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyBackendConnectionErrors
        expr: sum by (backend)(rate(haproxy_backend_connection_errors_total[1m])) * 100 > 5
        for: 5m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy backend connection errors (instance {{ $labels.instance }})"
          description: "Too many connection errors to {{ $labels.fqdn }}/{{ $labels.backend }} backend (> 5%). Request throughput may be to high.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyBackendConnectionErrors
        expr: sum by (backend) (rate(haproxy_backend_connection_errors_total[1m])) * 100 > 35
        for: 5m
        labels:
          severity: 24x7
        annotations:
          summary: "HAProxy backend connection errors (instance {{ $labels.instance }})"
          description: "Too many connection errors to {{ $labels.fqdn }}/{{ $labels.backend }} backend (> 5%). Request throughput may be to high.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyServerResponseErrors
        expr: sum by (server)(rate(haproxy_server_response_errors_total[1m])) * 100 > 5
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy server response errors (instance {{ $labels.instance }})"
          description: "Too many response errors to {{ $labels.server }} server (> 5%).\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyServerConnectionErrors
        expr: sum by (server)(rate(haproxy_server_connection_errors_total[1m])) * 100 > 5
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy server connection errors (instance {{ $labels.instance }})"
          description: "Too many connection errors to {{ $labels.server }} server (> 5%). Request throughput may be to high.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyPendingRequests
        expr: sum by (backend) (haproxy_backend_current_queue) > 0
        for: 5m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy pending requests (instance {{ $labels.instance }})"
          description: "Some HAProxy requests are pending on {{ $labels.fqdn }}/{{ $labels.backend }} backend\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"      

      - alert: HaproxyRetryHigh
        expr: sum by (backend)(rate(haproxy_backend_retry_warnings_total[1m])) > 10
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy retry high (instance {{ $labels.instance }})"
          description: "High rate of retry on {{ $labels.fqdn }}/{{ $labels.backend }} backend\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyBackendDown
        expr: haproxy_backend_up == 0
        for: 5m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy backend down (instance {{ $labels.instance }})"
          description: "HAProxy backend is down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyFrontendSecurityBlockedRequests
        expr: sum by (frontend)(rate(haproxy_frontend_requests_denied_total[5m])) > 10
        for: 5m
        labels:
          severity: debug
        annotations:
          summary: "HAProxy frontend security blocked requests (instance {{ $labels.instance }})"
          description: "HAProxy is blocking requests for security reason\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HaproxyServerHealthcheckFailure
        expr: increase(haproxy_server_check_failures_total[15m]) > 10
        for: 5m
        labels:
          severity: 10x5
        annotations:
          summary: "HAProxy server healthcheck failure (instance {{ $labels.instance }})"
          description: "Some server healthcheck are failing on {{ $labels.server }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
      - alert: HaproxyServerHealthcheckFailure
        expr: increase(haproxy_server_check_failures_total[15m]) > 100
        for: 5m
        labels:
          severity: 24x7
        annotations:
          summary: "HAProxy server healthcheck failure (instance {{ $labels.instance }})"
          description: "Some server healthcheck are failing on {{ $labels.server }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

@xocasdashdash
Copy link
Author

xocasdashdash commented Jun 10, 2021

Just changing this 4 lines work:

lines := strings.Split(content, "\n")
if lastLine >= len(lines) {
	lastLine = len(lines) - 1
}
for _, c := range lines[firstLine-1 : lastLine] {

But probably should be fixed on the line detection code. You can reduce the test case to having a single alert and being the last one like this:

groups:
  - name: "haproxy.api_server.rules"
    rules:
      - alert: HaproxyServerHealthcheckFailure
        expr: increase(haproxy_server_check_failures_total[15m]) > 100
        for: 5m
        labels:
          severity: 24x7
        annotations:
          summary: "HAProxy server healthcheck failure (instance {{ $labels.instance }})"
          description: "Some server healthcheck are failing on {{ $labels.server }}\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

@prymitive
Copy link
Collaborator

Thanks! Can you share your pint config too?

@xocasdashdash
Copy link
Author

Sure!

prometheus "01-live" {
  uri     = "a valid prometheus url"
  timeout = "60s"
}
prometheus "01-work" {
  uri     = "a valid prometheus url"
  timeout = "60s"
}
rule {
  match {
    kind = "alerting"
  }
  # Each alert must have a 'severity' annotation that's either '24x7','10x5' or 'debug'.
  label "severity" {
    severity = "bug"
    value    = "(24x7|10x5|debug)"
    required = true
  }
  annotation "runbook_url" {
    severity = "warning"
    required = true
  }
}

rule {
  # Disallow spaces in label/annotation keys, they're only allowed in values.
  reject ".* +.*" {
    label_keys      = true
    annotation_keys = true
  }

  # Disallow URLs in labels, they should go to annotations.
  reject "https?://.+" {
    label_keys   = true
    label_values = true
  }
  # Check how many times each alert would fire in the last 1d.
  alerts {
    range   = "1d"
    step    = "1m"
    resolve = "5m"
  }
  # Check if '{{ $value }}'/'{{ .Value }}' is used in labels
  # https://www.robustperception.io/dont-put-the-value-in-alert-labels
  value {}
}

It's basically a copy of the one available as an example.

@prymitive
Copy link
Collaborator

It looks like escaped new lines in description: "Some server healthcheck are failing on {{ $labels.server }}\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" are turned into real new lines when parsing yaml. So when description value is accessed it's 3 lines, rather than 1. And that's how we end up with wrong line range for this field.
Which adds to other problems with trying to use go-yaml to parse files while retaining file position and being able to use that to accurately point to problems.
I'll try to workaround this (there are already some hacks around go-yaml position handling), if that's not possible we'll emit a big warning that position might be wrong when printing out to console.

@xocasdashdash
Copy link
Author

interesting, i think the warning is a good option. When do you think you'd emit this warning? Whenever you see a newline on the text? Or when there's a mismatch with the line count ?

@prymitive
Copy link
Collaborator

When we try to read more lines then the source file has

@prymitive
Copy link
Collaborator

Added a workaround for now, need to address the root issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants