Skip to content

OpenLineage AirflowRunFacet.json wrong type for tags #43638

@MaartenHubrechts

Description

@MaartenHubrechts

Apache Airflow Provider(s)

openlineage

Versions of Apache Airflow Providers

apache-airflow-providers-openlineage 1.12.2

Apache Airflow version

2.9.3

Operating System

Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

The AirflowRunFacet.json currently isn't usable to validate OpenLineage logs that contain the AirflowRunFacet:
https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/openlineage/facets/AirflowRunFacet.json#L175
This is due to the type of the tags element (on lines 175-177):

"tags": {
  "type": "string"
}

The logs that get produced by the openlineage provider contain lists of strings as tags. When one tag is given to a DAG this is still a list that contains 1 string, when multiple tags are given to a DAG the list contains all these strings.

What you think should happen instead

The tags field in AirflowRunFacet.json should be changed as follows:

"tags": {
  "type": "array",
  "items": {
    "type": "string"
  }
}

Which validates that it is indeed an array of strings.

How to reproduce

Validate any given openlineage log that contains the airflow run facet with the current AirflowRunFacet.json and the validation will fail.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions