Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to deploy Temporal using ArgoCD #521

Closed
washeeeq opened this issue Jul 3, 2024 · 4 comments
Closed

[Bug] Unable to deploy Temporal using ArgoCD #521

washeeeq opened this issue Jul 3, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@washeeeq
Copy link

washeeeq commented Jul 3, 2024

What are you really trying to do?

Describe the bug

Using GCP Postgres as default store and as visibility store.
We are using a chart as a subchart:

apiVersion: v2
name: temporal
description: A Helm chart with temporal as subchart
type: application
version: 0.1.0
appVersion: "0.1.0"
dependencies:
  - name: temporal
    version: v0.43.0
    repository: https://temporalio.github.io/helm-charts
    alias: temporal

Than adding a values to modify the deployment:

temporal:
  server:
    config:
      persistence:
        default:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal
            user: temporal_app
            existingSecret: temporal-default-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
        visibility:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal_visibility
            user: temporal_visibility_app
            existingSecret: temporal-visibility-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
    dynamicConfig:
      frontend.globalNamespaceRPS: # Total per-Namespace RPC rate limit applied across the Cluster.
        - value: 5000
    frontend:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
      podAnnotations:
        helm.sh/hook: pre-install,pre-upgrade
        helm.sh/hook-weight: "-1"
      command: ["/bin/sh"]
      args: ["-c", "while true; do sleep 600; done"]
      # additionalVolumes: 
      #   - name: dynamic-config
      #     configMap:
      #       name: "temporal-dynamic-config"
      #       items:
      #       - key: dynamic_config.yaml
      #         path: dynamic_config.yaml
      additionalVolumeMounts:
        - mountPath: /etc/temporal/dynamic_config/dynamic_config.yaml
          name: dynamic-config
          subPath: dynamic_config.yaml
    history:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    matching:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    worker:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
  cassandra:
    enabled: false
  prometheus:
    enabled: false
  grafana:
    enabled: false
  # we are not deploying Postgres
  postgresql:
    enabled: false
  elasticsearch:
    enabled: true
    replicas: 2
    minimumMasterNodes: 1
    host: "elasticsearch-master"
    # this really causes an issue
    # external: true
    resources:
      requests:
        cpu: "500m"
        memory: "2Gi"
      limits:
        cpu: "500m"
        memory: "4Gi"
  schema:
    createDatabase:
      enabled: false
    setup:
      enabled: true
    update:
      enabled: false

Now with such a config, temporal hangs on bringing up for pods:
frontend, history, worker and one more.

After further investigation I found out that a batch job to create index (es-index-setup) is not starting.
Probably wrong weight used:
"helm.sh/hook-weight": "0"

If I add
external: true
the script is triggered but this hinders than initial deployment.

After the elastic is initilazed, frontend pod is starting but very quickly ends with:

2024/07/01 09:25:14 Loading config; env=docker,zone=,configDir=config
2024/07/01 09:25:14 Loading config files=[config/docker.yaml]
Unable to load configuration: config file corrupted: yaml: line 30: found unknown escape character.

Minimal Reproduction

  • Install ArgoCD on the Kubernetes cluster
  • create chart with a subchart of temporal
  • create application file for ArgoCD which triggers the deployment
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: temporal
spec:
  destination:
    name: ''
    namespace: temporal
    server: 'https://kubernetes.default.svc'
  source:
    path: cloud/helm/infra/temporal
    repoURL: 'https://github.com/your_repo'
    targetRevision: feature/SRE-133--add_temporal
    helm:
      valueFiles:
        - overlays/dev/values.yaml
  sources: []
  project: google-cloud-dev
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Environment/Versions

  • OS and processor: AMD64, Linux
  • Temporal Version: latest
  • Are you using Docker or Kubernetes or building Temporal from source: Kubernetes
@washeeeq washeeeq added the bug Something isn't working label Jul 3, 2024
@robholland
Copy link
Contributor

The hooks are now removed on main (replaced with a single job), that should improve things for you once that is released. The config issue will be unrelated, please can you check the configmap and see what line 30 of the config we generated there looks like?

@robholland robholland self-assigned this Jul 4, 2024
@washeeeq
Copy link
Author

washeeeq commented Jul 6, 2024

Yes,
but that is a bit unclear to me, I tried to use busybox image to connect to the container but I see no config/docker.yaml there so probably you take config_template.yaml and somehow transform it to docker.yaml. So here is a configmap from the cluster:

apiVersion: v1
data:
  config_template.yaml: |-
    log:
      stdout: true
      level: "debug,info"

    persistence:
      defaultStore: default
      visibilityStore: es-visibility
      numHistoryShards: 512
      datastores:
        default:
          sql:
            pluginName: "postgres12"
            driverName: "postgres12"
            databaseName: "temporal"
            connectAddr: "10.100.100.3:5432"
            connectProtocol: "tcp"
            user: temporal_app
            password: "{{ .Env.TEMPORAL_STORE_PASSWORD }}"
            maxConnLifetime: 1h
            maxConns: 20
            secretName: ""
        visibility:
          sql:
            pluginName: "postgres12"
            driverName: "postgres12"
            databaseName: "temporal_visibility"
            connectAddr: "10.100.100.3:5432"
            connectProtocol: "tcp"
            user: "temporal_visibility_app"
            password: "{{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}"
            maxConnLifetime: 1h
            maxConns: 20
            secretName: ""
        es-visibility:
            elasticsearch:
                version: "v7"
                url:
                    scheme: "http"
                    host: "elasticsearch-master:9200"
                username: ""
                password: ""
                logLevel: "error"
                indices:
                    visibility: "temporal_visibility_v1_dev"

    global:
      membership:
        name: temporal
        maxJoinDuration: 30s
        broadcastAddress: {{ default .Env.POD_IP "0.0.0.0" }}

      pprof:
        port: 7936

      metrics:
        tags:
          type: frontend
        prometheus:
          timerType: histogram
          listenAddress: "0.0.0.0:9090"

    services:
      frontend:
        rpc:
          grpcPort: 7233
          membershipPort: 6933
          bindOnIP: "0.0.0.0"

      history:
        rpc:
          grpcPort: 7234
          membershipPort: 6934
          bindOnIP: "0.0.0.0"

      matching:
        rpc:
          grpcPort: 7235
          membershipPort: 6935
          bindOnIP: "0.0.0.0"

      worker:
        rpc:
          grpcPort: 7239
          membershipPort: 6939
          bindOnIP: "0.0.0.0"
    clusterMetadata:
      enableGlobalDomain: false
      failoverVersionIncrement: 10
      masterClusterName: "active"
      currentClusterName: "active"
      clusterInformation:
        active:
          enabled: true
          initialFailoverVersion: 1
          rpcName: "temporal-frontend"
          rpcAddress: "127.0.0.1:7233"
    dcRedirectionPolicy:
      policy: "noop"
      toDC: ""
    archival:
      status: "disabled"

    publicClient:
      hostPort: "temporal-frontend:7233"

    dynamicConfigClient:
      filepath: "/etc/temporal/dynamic_config/dynamic_config.yaml"
      pollInterval: "10s"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","data":{"config_template.yaml":"log:\n  stdout: true\n 
      level: \"debug,info\"\n\npersistence:\n  defaultStore: default\n 
      visibilityStore: es-visibility\n  numHistoryShards: 512\n 
      datastores:\n    default:\n      sql:\n        pluginName:
      \"postgres12\"\n        driverName: \"postgres12\"\n        databaseName:
      \"temporal\"\n        connectAddr: \"10.100.100.3:5432\"\n       
      connectProtocol: \"tcp\"\n        user: temporal_app\n        password:
      \"{{ .Env.TEMPORAL_STORE_PASSWORD }}\"\n        maxConnLifetime:
      1h\n        maxConns: 20\n        secretName: \"\"\n       
      tls:\n          enabled: false\n    visibility:\n      sql:\n       
      pluginName: \"postgres12\"\n        driverName: \"postgres12\"\n       
      databaseName: \"temporal_visibility\"\n        connectAddr:
      \"10.100.100.3:5432\"\n        connectProtocol: \"tcp\"\n        user:
      \"temporal_visibility_app\"\n        password: \"{{
      .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}\"\n        maxConnLifetime:
      1h\n        maxConns: 20\n        secretName: \"\"\n       
      tls:\n          enabled: false\n    es-visibility:\n       
      elasticsearch:\n            version: \"v7\"\n           
      url:\n                scheme: \"http\"\n                host:
      \"elasticsearch-master:9200\"\n            username: \"\"\n           
      password: \"\"\n            logLevel: \"error\"\n           
      indices:\n                visibility:
      \"temporal_visibility_v1_dev\"\n\nglobal:\n  membership:\n    name:
      temporal\n    maxJoinDuration: 30s\n    broadcastAddress: {{ default
      .Env.POD_IP \"0.0.0.0\" }}\n\n  pprof:\n    port: 7936\n\n  metrics:\n   
      tags:\n      type: frontend\n    prometheus:\n      timerType:
      histogram\n      listenAddress: \"0.0.0.0:9090\"\n\nservices:\n 
      frontend:\n    rpc:\n      grpcPort: 7233\n      membershipPort:
      6933\n      bindOnIP: \"0.0.0.0\"\n\n  history:\n    rpc:\n      grpcPort:
      7234\n      membershipPort: 6934\n      bindOnIP: \"0.0.0.0\"\n\n 
      matching:\n    rpc:\n      grpcPort: 7235\n      membershipPort:
      6935\n      bindOnIP: \"0.0.0.0\"\n\n  worker:\n    rpc:\n      grpcPort:
      7239\n      membershipPort: 6939\n      bindOnIP:
      \"0.0.0.0\"\nclusterMetadata:\n  enableGlobalDomain: false\n 
      failoverVersionIncrement: 10\n  masterClusterName: \"active\"\n 
      currentClusterName: \"active\"\n  clusterInformation:\n    active:\n     
      enabled: true\n      initialFailoverVersion: 1\n      rpcName:
      \"temporal-frontend\"\n      rpcAddress:
      \"127.0.0.1:7233\"\ndcRedirectionPolicy:\n  policy: \"noop\"\n  toDC:
      \"\"\narchival:\n  status: \"disabled\"\n\npublicClient:\n  hostPort:
      \"temporal-frontend:7233\"\n\ndynamicConfigClient:\n  filepath:
      \"/etc/temporal/dynamic_config/dynamic_config.yaml\"\n  pollInterval:
      \"10s\""},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"temporal","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"temporal","app.kubernetes.io/part-of":"temporal","app.kubernetes.io/version":"1.24.2","argocd.argoproj.io/instance":"temporal","helm.sh/chart":"temporal-0.43.0"},"name":"temporal-frontend-config","namespace":"temporal"}}
  creationTimestamp: '2024-07-06T16:45:27Z'
  labels:
    app.kubernetes.io/instance: temporal
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: temporal
    app.kubernetes.io/part-of: temporal
    app.kubernetes.io/version: 1.24.2
    argocd.argoproj.io/instance: temporal
    helm.sh/chart: temporal-0.43.0
  name: temporal-frontend-config
  namespace: temporal
  resourceVersion: '68519726'
  uid: ba3537f0-b0fe-4492-8588-3498b2a8a0f8

I think the problem is with this line:
password: "{{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}"

claude gave a hint to do it this way:
password: "{{ {{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }} }}"

It seems our password contains special characters

@robholland
Copy link
Contributor

robholland commented Jul 6, 2024

Ok, this is a known issue, this is fixed in the latest helm chart release.

@robholland
Copy link
Contributor

Closing now, but please open a new issue if you are still having configmap issues with latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants