Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenLineage in spark-3.x-bigquery connectors #1212

Conversation

codelixir
Copy link
Contributor

  1. Add openlineage properties to Spark31BigQueryTable class
  2. Add BigQueryRelationProvider as an abstract class to v2 module, to be extended by BaseBigQuerySource (parent class of all the Spark BigQuery Table Provider classes).

@vishalkarve15
Copy link
Contributor

/gcbrun

pom.xml Outdated Show resolved Hide resolved
Copy link
Member

@davidrabinowitz davidrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an integration test testing that the lineage events are created.

@vishalkarve15
Copy link
Contributor

/gcbrun

@codelixir
Copy link
Contributor Author

codelixir commented Apr 22, 2024

I have moved the logic to the common module, as discussed, so that both dsv1 and dsv2 connectors call the same method internally.

@vishalkarve15
Copy link
Contributor

/gcbrun

1 similar comment
@vishalkarve15
Copy link
Contributor

/gcbrun

@ddebowczyk92
Copy link
Contributor

Hey @codelixir, thank you for your contribution! We appreciate your effort. Have you thought about leveraging the spark-interfaces-scala package for generating metadata for OpenLineage events? This package is designed to facilitate the transition of lineage extraction ownership to the Spark extension owners. You can find more information about it here. Thanks once again for your contribution!

@davidrabinowitz
Copy link
Member

Hi @ddebowczyk92 , thanks for the input! We try to keep the DataSource v2 connectors Scala agnostic in order to simplify the usage for customers due to the incompatibility between Scala 2.12 and 2.13. Once this is PR is done, we can think how to incorporate the interface into the connector.

@davidrabinowitz davidrabinowitz changed the title Add BigQueryRelationProvider class for OpenLineage Support OpenLineage in spark-3.x-bigquery connectors Apr 25, 2024
Signed-off-by: Pahulpreet Singh <pahulpreets@google.com>
@vishalkarve15
Copy link
Contributor

/gcbrun

codelixir and others added 2 commits April 30, 2024 05:14
@davidrabinowitz
Copy link
Member

/gcbrun

@davidrabinowitz
Copy link
Member

/gcbrun

@davidrabinowitz davidrabinowitz merged commit 558f18f into GoogleCloudDataproc:master May 1, 2024
7 of 9 checks passed
isha97 pushed a commit that referenced this pull request May 29, 2024
Signed-off-by: Pahulpreet Singh <pahulpreets@google.com>
isha97 pushed a commit that referenced this pull request May 29, 2024
Signed-off-by: Pahulpreet Singh <pahulpreets@google.com>
isha97 pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: Pahulpreet Singh <pahulpreets@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants