Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): sql parsing aggregator #9786

Merged
merged 53 commits into from
Feb 9, 2024

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Feb 6, 2024

  • Adds the SqlParsingAggregator class, which supports temp tables and query IDs. Also has built-in support for good reporting and for dumping queries to a file for debugging
  • Adds query fingerprinting
  • Moves sqlglot_lineage related stuff into the datahub.sql_parsing subpackage and refactors the massive file it into multiple smaller ones.
  • Adds a SchemaResolverInterface type and a with_temp_tables method
  • Adds an ordered set utility class
  • Fix a bug in sqlglot_lineage where INSERT INTO table (a, b) AS SELECT would register the inserted table as an input instead of an output.
  • Fixes some bugs in the FileBackedDict implementation + adds a for_mutation method

I probably should've split this into multiple PRs, but it's too annoying to do now.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

) or _is_temp_table(target, dialect=dialect):
query_type_props["temporary"] = True

if kind and "TABLE" in kind:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be nice as const or enum

@hsheth2 hsheth2 merged commit 0d780e5 into datahub-project:master Feb 9, 2024
54 checks passed
@hsheth2 hsheth2 deleted the sqlparsing-aggregator branch February 9, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants