Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/dremio): Dremio software jobs retrieval SQL query fix query error #11817

Merged

Conversation

acrylJonny
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 7, 2024
Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment as to inconsistent type of queries_datasets across cloud and software would help for future readers.

@acrylJonny
Copy link
Contributor Author

Adding a comment as to inconsistent type of queries_datasets across cloud and software would help for future readers.

Added comment on dremio_sql_queries.py pointing to incorrect documentation.

@mayurinehate mayurinehate merged commit ab0b0a2 into datahub-project:master Nov 14, 2024
74 checks passed
@BernardToure
Copy link

@acrylJonny , @mayurinehate ,

I'm using Dremio Cloud and I have sqlglot parsing errors because :
1 - the way Dremio deal with numbers and special characters in folders and table name
2 - the addition of branch name when using arctic to version view

For point 1, Dremio is using double quotes instead of backquotes around folders name in the from part of the queries
For point 2, Dremio refer to the source branch by adding AT BRANCH main after the table name.

I was able to stop sqlglot errors by using modified VIEW_DEFINITION and query column in the metadata-ingestion/src/datahub/ingestion/source/dremio/dremio_sql_queries.py file.

Before:
SELECT * FROM
(
SELECT
RESOURCE_ID,
V.TABLE_NAME,
OWNER,
PATH AS TABLE_SCHEMA,
CONCAT(REPLACE(REPLACE(
REPLACE(V.PATH, ', ', '.'),
'[', ''), ']', ''
)) AS FULL_TABLE_PATH,
OWNER_TYPE,
LOCATION_ID,
VIEW_DEFINITION,
FORMAT_TYPE,
COLUMN_NAME,
ORDINAL_POSITION, ...

After:
SELECT * FROM
(
SELECT
RESOURCE_ID,
V.TABLE_NAME,
OWNER,
PATH AS TABLE_SCHEMA,
CONCAT(REPLACE(REPLACE(
REPLACE(V.PATH, ', ', '.'),
'[', ''), ']', ''
)) AS FULL_TABLE_PATH,
OWNER_TYPE,
LOCATION_ID,
REGEXP_REPLACE(REGEXP_REPLACE(VIEW_DEFINITION,'(?i)AT BRANCH .?MAIN.?',''),'"','`') as VIEW_DEFINITION,
FORMAT_TYPE,
COLUMN_NAME,
ORDINAL_POSITION, ...

Same for the sys.project.history.jobs, replacing :
SELECT
job_id,
user_name,
submitted_ts,
query,

by
SELECT
job_id,
user_name,
submitted_ts,
REGEXP_REPLACE(REGEXP_REPLACE(query,'(?i)AT BRANCH .?MAIN.?',''),'"','`') as query,

These change replace all doublequotes by backquotes and remove AT BRANCH MAIN (case insensitives).

Can these changes be implemented and I'll be more than happy to test if you tell me how.

Cheers,
Berni

sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Dec 17, 2024
…y error (datahub-project#11817)

Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants