Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marquez should allow a field with no datatype #2261

Closed
pawel-big-lebowski opened this issue Nov 21, 2022 · 2 comments · Fixed by #2272
Closed

Marquez should allow a field with no datatype #2261

pawel-big-lebowski opened this issue Nov 21, 2022 · 2 comments · Fixed by #2272
Assignees
Labels
good first issue Good for newcomers
Milestone

Comments

@pawel-big-lebowski
Copy link
Collaborator

Openlineage spec allows field with no datatype:

                "type" : {
                  "description" : "The type of the field.",
                  "type" : "string",
                  "example" : "VARCHAR|INT|..."
                },

Marquez validation requires non-null fields.

@pawel-big-lebowski pawel-big-lebowski self-assigned this Nov 21, 2022
@wslulciuc wslulciuc added the good first issue Good for newcomers label Nov 22, 2022
@wslulciuc wslulciuc added this to the Roadmap milestone Nov 22, 2022
@wslulciuc
Copy link
Member

wslulciuc commented Nov 22, 2022

@pawel-big-lebowski, if we don't know the datatype, we have a few choices to use as defaults:

  • null
  • An empty string ("")
  • UNKNOWN

I think null would be the most appropriate.

@pawel-big-lebowski
Copy link
Collaborator Author

I am not able to reproduce the behavior.

  • I did the check code and classes: DatasetFieldMapper, DatasetFieldRowMapper and marquez.service.models.LineageEvent.SchemaField do allow null values.
  • I've created a test which uses marquez.db.LineageTestUtils#createLineageRow with a dataset having fields with empty data type and no data types provided. The test passes.
  • Then I manually run Jupyter and Marquez in docker and sent an event:
event = {
    'eventType': 'COMPLETE',
    'eventTime': '2022-10-25T11:35:31.341Z',
    'job': {
        'namespace': 'anothar-job-namespace',
        'name': 'another-job',
        'facets': {'documentation': None, 'sourceCodeLocation': None, 'sql': None}
    },
    'run': {
        'runId': 'ae8f3ab7-254b-4d81-9c0a-c152f6sdac90'
    },
    'inputs': [
        {
            'namespace': 'hive://metastore', # read a dataset by its logical name
            'name': 'default.some_table',
        }
    ],
    'outputs': [
         {
            'namespace': 'another-namespace', # write to some other dataset
            'name': 'another-table',
            'facets': {
                'schema': {
                    '_producer': 'https://github.com/OpenLineage/OpenLineage/tree/0.15.1/integration/spark',
                    '_schemaURL': 'https://openlineage.io/spec/1-0-3/OpenLineage.json#/$defs/RunEvent',
                    'fields': [
                        { 'name': 'a', 'type': '', 'description': '' },
                        { 'name': 'b', 'description': '' }
                    ]
                }
            }
        }
    ],
    'producer': 'https://github.com/OpenLineage/OpenLineage/tree/0.15.1/integration/spark',
    'schemaURL': 'https://openlineage.io/spec/1-0-3/OpenLineage.json#/$defs/RunEvent'
}

which also got successfully written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants