Skip to content

DAG disappears in airflow 3.1.3 #58717

@narenjngr

Description

@narenjngr

Apache Airflow version

3.1.3

If "Other Airflow 2/3 version" selected, which one?

3.1.3

What happened?

What happened?
After upgrade to airflow 3.1.3, system started experiencing random DAG disappearance.
The config for dag processor has this setup:


       - name: AIRFLOW__DAG_PROCESSOR__BUNDLE_REFRESH_CHECK_INTERVAL
          value: '120'
        - name: AIRFLOW__DAG_PROCESSOR__REFRESH_INTERVAL
          value: '120'
        - name: AIRFLOW__CORE__STORE_SERIALIZED_DAGS
          value: 'True'
        - name: AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL
          value: '60'
        - name: AIRFLOW__DAG_PROCESSOR__MIN_FILE_PROCESS_INTERVAL
          value: '120'
        - name: AIRFLOW__DAG_PROCESSOR__STALE_DAG_THRESHOLD
          value: '86400'

Files found by aiflow dag -folder also reduce over the period of time.
Here is my dag processor values.yaml file section-


dagProcessor:
  enabled: true
  replicas: 1
  revisionHistoryLimit: ~
  command: ~
  args: ["bash", "-c", "exec airflow dag-processor"]


  strategy:
    rollingUpdate:
      maxSurge: "100%"
      maxUnavailable: "50%"

  livenessProbe:
    initialDelaySeconds: 120
    timeoutSeconds: 60
    failureThreshold: 10
    periodSeconds: 60
    command: ~

  serviceAccount:
    automountServiceAccountToken: true
    create: false
    name: "airflow"

    annotations: {}

  securityContext: {}

  securityContexts:
    pod: {}
    container: {}

  containerLifecycleHooks: {}

  resources:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 500Mi

  terminationGracePeriodSeconds: 60

  safeToEvict: true

  extraContainers: []
  extraInitContainers: []
  extraVolumes: []
  extraVolumeMounts: []

  # Select certain nodes for airflow dag processor pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []
  topologySpreadConstraints: []

  priorityClassName: ~

  annotations: {}

  podAnnotations: {}

  logGroomerSidecar:
    enabled: true
    command: ~
    args: ["bash", "/clean-logs"]
    retentionDays: 15
    frequencyMinutes: 15
    resources: {}
    securityContexts:
      container: {}

    env: []

  waitForMigrations:
    enabled: true
    env: []
    securityContexts:
      container: {}

  env: 
       - name: AIRFLOW__DAG_PROCESSOR__BUNDLE_REFRESH_CHECK_INTERVAL
          value: '120'
        - name: AIRFLOW__DAG_PROCESSOR__REFRESH_INTERVAL
          value: '120'
        - name: AIRFLOW__CORE__STORE_SERIALIZED_DAGS
          value: 'True'
        - name: AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL
          value: '60'
        - name: AIRFLOW__DAG_PROCESSOR__MIN_FILE_PROCESS_INTERVAL
          value: '120'
        - name: AIRFLOW__DAG_PROCESSOR__STALE_DAG_THRESHOLD
          value: '86400'

One important point I observed is that whatever DAG is disappeared from UI if I check its detail on any of the airflow pod using command airflow dags details DAG_ID then I can see IsStale = true
Why DAGs are being marked as stale even after high value 86400 of AIRFLOW__DAG_PROCESSOR__STALE_DAG_THRESHOLD

What you think should happen instead?

DAGs shouldn't disappear unless I delete these DAGs from the mounted volume.

How to reproduce

Use same dag processor config and observer for few days. Intermittent issue.

Operating System

Linux

Versions of Apache Airflow Providers

eval_type_backport==0.2.2
apache-airflow-providers-databricks==6.4.0
apache-airflow-providers-mongo==4.2.1
apache-airflow-providers-git==0.0.2
apache-airflow-providers-standard==1.2.0
soda-core-spark-df==3.5.5
soda-core-spark[databricks]==3.5.5
soda-core-scientific==3.5.5
pymongo~=4.0.0
typing_extensions==4.13.2
paramiko<4
PyMuPDF~=1.26.5

Deployment

Official Apache Airflow Helm Chart

Deployment details

Official Apache Airflow Helm Chart

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    affected_version:3.1Issues Reported for 3.1area:DAG-processingarea:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions