Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-functional deployment with helm chart #562

Closed
maxime-sourdin opened this issue Sep 26, 2022 · 5 comments · Fixed by #2045
Closed

Non-functional deployment with helm chart #562

maxime-sourdin opened this issue Sep 26, 2022 · 5 comments · Fixed by #2045

Comments

@maxime-sourdin
Copy link

Hello,
I tried to deploy Oncall with helm chart (on a managed Kubernetes cluster, via ArgoCD), using the built-in Mysql.

I am seeing problems as the database migration jobs are not going through:

amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.

When I did the tests based on the provided docker-compose, I had the same problem.

Here is an extract from the values file:

base_url: example.com
image:
  repository: grafana/oncall
  tag: "v1.0.37"
  pullPolicy: IfNotPresent
service:
  enabled: false
  type: LoadBalancer
  port: 8080
  annotations: {}
engine:
  replicaCount: 1
  resources: {}
celery:
  replicaCount: 1
  resources: {}
oncall:
  slack:
    enabled: false
    command: ~
    clientId: ~
    clientSecret: ~
    apiToken: ~
    apiTokenCommon: ~
  telegram:
    enabled: false
    token: ~
    webhookUrl: ~
migrate:
  enabled: true
env: []
ingress:
  enabled: false
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/issuer: "letsencrypt-prod"
  tls: 
    - hosts:
        - "{{ .Values.base_url }}"
      secretName: certificate-tls
  extraPaths: []
ingress-nginx:
  enabled: false
cert-manager:
  enabled: false
  installCRDs: false
  webhook:
    timeoutSeconds: 30
    securePort: 10260
  podDnsPolicy: None
  podDnsConfig:
    nameservers:
      - 8.8.8.8
      - 1.1.1.1
mariadb:
  enabled: true
  persistence:
    enabled: true
    storageClass: "csi-ssd-disk-topology"  
  auth:
    database: oncall
  primary:
    persistence:
      enabled: true
      storageClass: "csi-ssd-disk-topology"  
    extraEnvVars:
    - name: MARIADB_COLLATE
      value: utf8mb4_unicode_ci
    - name: MARIADB_CHARACTER_SET
      value: utf8mb4
  secondary:
    persistence:
      enabled: true
      storageClass: "csi-ssd-disk-topology"  
    extraEnvVars:
    - name: MARIADB_COLLATE
      value: utf8mb4_unicode_ci
    - name: MARIADB_CHARACTER_SET
      value: utf8mb4
externalMysql:
  host:
  port:
  db_name:
  user:
  password:
rabbitmq:
  enabled: true
  persistence:
    enabled: true
    storageClass: "csi-ssd-disk-topology"  
externalRabbitmq:
  host:
  port:
  user:
  password:
  protocol:
  vhost:
redis:
  enabled: true
  architecture: standalone
  replica:
    count: 1
    persistence:
      enabled: true
      storageClass: "csi-ssd-disk-topology"
  master:
    count: 1
    persistence:
      enabled: true
      storageClass: "csi-ssd-disk-topology"  
externalRedis:
  host:
  password:
grafana:
  enabled: false
  grafana.ini:
    server:
      domain: example.com
      root_url: "%(protocol)s://%(domain)s/grafana"
      serve_from_sub_path: true
  persistence:
    enabled: false
  plugins:
    - grafana-oncall-app
nameOverride: ""
fullnameOverride: ""
serviceAccount:
  create: true
  annotations: {}
  name: ""
podAnnotations: {}
podSecurityContext: {}
  # fsGroup: 2000
securityContext: {}
init:
  securityContext: {}

What could be the cause of this problem?

@Matvey-Kuk
Copy link
Contributor

Hi! AMPQ makes me think it's about network connectivity between RabbitMQ and container performing migration.

@maxime-sourdin
Copy link
Author

Hi! AMPQ makes me think it's about network connectivity between RabbitMQ and container performing migration.

Hello,
I redeploy Oncall with new volumes, rabbitmq is now OK.

2022-09-27 15:00:00.098389+00:00 [info] <0.4429.0> accepting AMQP connection <0.4429.0> (172.16.0.177:42506 -> 172.16.0.47:5672)
2022-09-27 15:00:00.101205+00:00 [info] <0.4429.0> connection <0.4429.0> (172.16.0.177:42506 -> 172.16.0.47:5672): user 'user' authenticated and granted access to vhost '/'

But I've got now these messages:

Operations to perform:
  Apply all migrations: admin, alerts, auth, auth_token, base, contenttypes, heartbeat, migration_tool, oss_installation, push_notifications, schedules, sessions, silk, slack, social_django, telegram, twilioapp, user_management
Running migrations:
  No migrations to apply.
  Your models in app(s): 'push_notifications', 'silk', 'social_django' have changes that are not yet reflected in a migration, and so won't be applied.
  Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.

Last time I tested to make these migrations and it stucks

@iskhakov
Copy link
Contributor

iskhakov commented Oct 6, 2022

Such a message tells that all the migrations are already applied

@maxime-sourdin
Copy link
Author

Hello @iskhakov ,
so the problem is not migration, I'll change the title of the issue then

I don't have any other issue except the migration one (so that's not one), but the oncall-engine pod keeps restarting

I also just noticed this error:
2022-10-06 07:54:44 lock engine: pthread robust mutexes
2022-10-06 07:54:44 thunder lock: disabled (you can enable it with --thunder-lock)
2022-10-06 07:54:44 Listen queue size is greater than the system max net.core.somaxconn (128).

@maxime-sourdin maxime-sourdin changed the title Non-functional database migrations during deployment with helm chart Non-functional deployment with helm chart Oct 6, 2022
@MadEngineX
Copy link

MadEngineX commented Nov 15, 2022

Hello @iskhakov , so the problem is not migration, I'll change the title of the issue then

I don't have any other issue except the migration one (so that's not one), but the oncall-engine pod keeps restarting

I also just noticed this error: 2022-10-06 07:54:44 lock engine: pthread robust mutexes 2022-10-06 07:54:44 thunder lock: disabled (you can enable it with --thunder-lock) 2022-10-06 07:54:44 Listen queue size is greater than the system max net.core.somaxconn (128).

Faced same issue. I found that smb had removed net.core.somaxconn property for docker-compose #84. I removed env UWSGI_LISTEN in k8s Deployment, and engine starts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants