We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've tried making an InputDataset using the Python client like the following:
InputDataset
pypi_schema = SchemaDatasetFacet(fields=[SchemaField(name="package_name", type="VARCHAR"), SchemaField(name="downloads", type="BIGINT"), SchemaField(name="year", type="INT"), SchemaField(name="month",type="INT")]) pypi_metrics = DataQualityMetricsInputDatasetFacet(rowCount=20, bytes=1024, columnMetrics={ "package_name": ColumnMetric( nullCount=0, distinctCount=20), "downloads": ColumnMetric(nullCount=0, distinctCount=10), }, ) pypi_downloads_i = InputDataset("bigquery", "pypi_data.downloads", {"schema": pypi_schema}, {"dataQuality": pypi_metrics})
if I then call GET /namespaeces/bigquery/datasets/pypi_data.downloads then the data returned includes the schema facet but not the data quality data.
/namespaeces/bigquery/datasets/pypi_data.downloads
schema
GET response shown below:
{ "id": { "namespace": "bigquery", "name": "pypi_data.downloads" }, "type": "DB_TABLE", "name": "pypi_data.downloads", "physicalName": "pypi_data.downloads", "createdAt": "2022-12-12T17:05:57.203892Z", "updatedAt": "2022-12-13T18:27:18.976182Z", "namespace": "bigquery", "sourceName": "default", "fields": [ { "name": "package_name", "type": "VARCHAR", "tags": [], "description": null }, { "name": "downloads", "type": "BIGINT", "tags": [], "description": null }, { "name": "year", "type": "INT", "tags": [], "description": null }, { "name": "month", "type": "INT", "tags": [], "description": null } ], "tags": [], "lastModifiedAt": null, "lastLifecycleState": "", "description": null, "currentVersion": "3f462106-f5f7-4260-86d8-9b163ac8e217", "columnLineage": null, "facets": { "schema": { "fields": [ { "name": "package_name", "type": "VARCHAR" }, { "name": "downloads", "type": "BIGINT" }, { "name": "year", "type": "INT" }, { "name": "month", "type": "INT" } ], "_producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/airflow", "_schemaURL": "https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SchemaDatasetFacet" } }, "deleted": false }
The text was updated successfully, but these errors were encountered:
I haven't tested it, but this part of the code looks fishy,
pypi_downloads_i = InputDataset("bigquery", "pypi_data.downloads", {"schema": pypi_schema}, {"dataQuality": pypi_metrics})
Try:
pypi_downloads_i = InputDataset("bigquery", "pypi_data.downloads", {"schema": pypi_schema}, {"dataQualityMetrics": pypi_metrics})
and see if it works. according to the https://openlineage.io/apidocs/openapi/#tag/OpenLineage/operation/postRunEvent, the name should be dataQualityMetrics, not dataQuality.
dataQualityMetrics
dataQuality
Sorry, something went wrong.
Using facet key name dataQualityMetrics does not fix it for me.
I was able to reproduce the issue. It seems like Marquez does not understand inputFacets nor outputFacets.
inputFacets
outputFacets
pawel-big-lebowski
Successfully merging a pull request may close this issue.
I've tried making an
InputDataset
using the Python client like the following:if I then call GET
/namespaeces/bigquery/datasets/pypi_data.downloads
then the data returned includes theschema
facet but not the data quality data.GET response shown below:
The text was updated successfully, but these errors were encountered: