Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): refactor + fix recursion in lookml file loading logic #2913

Merged
merged 17 commits into from
Jul 22, 2021

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Jul 20, 2021

Also improves the error messages and logging throughout the source, and add an additional test.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@hsheth2 hsheth2 marked this pull request as ready for review July 20, 2021 00:24
@hsheth2
Copy link
Collaborator Author

hsheth2 commented Jul 20, 2021

@grantatspothero and @frsann - given that this is a pretty large refactoring, it would be great if you could take a look at change changes or try it out and verify that it works as intended.

@shirshanka
Copy link
Contributor

This LGTM, also adding @remisalmon and @zack3241 to take a look.

@jameslamb
Copy link
Contributor

@grantatspothero and @frsann - given that this is a pretty large refactoring, it would be great if you could take a look at change changes or try it out and verify that it works as intended.

hey @hsheth2 , I recently joined @grantatspothero 's team and have been looking at LookML ingestion for DataHub. I can take a look at these changes and test this branch today against our Looker instance and let you know if I see any issues 😀

@frsann
Copy link
Contributor

frsann commented Jul 20, 2021

I'm on vacation the next three weeks, so please dont wait on my input on this.

@jameslamb
Copy link
Contributor

@hsheth2 I see you're still pushing commits...will you @ me whenever you're ready for me to pull and test this?

@hsheth2
Copy link
Collaborator Author

hsheth2 commented Jul 20, 2021

@jameslamb sorry about thrashing this branch - it should be good to test now!

@shirshanka
Copy link
Contributor

@jameslamb : this looks good on our end. Let us know when you get a chance to test on yours.

@jameslamb
Copy link
Contributor

jameslamb commented Jul 21, 2021

Thanks! Ok, I tested this on SpotHero's Looker instance today, and ran into an issue.

how i tested this

Ran acryl-datahub in a container using the python-slim:3.7 image, with the following command.

pip uninstall acryl-datahub && \
    pip install 'git+https://github.com/hsheth2/datahub.git@lookml-view-resolution#egg=acryl_datahub[datahub-kafka,lookml]&subdirectory=metadata-ingestion'

Ran the following

datahub ingest -c "/recipes/lookml-recipe.yaml"

where the recipe looks like this:

source:
  type: lookml
  config:
    base_folder: /model-files
    connection_to_platform_map:
      redshift_test: redshift
    platform_name: looker
    env: "PROD"
    parse_table_names_from_sql: False

sink:
  type: "datahub-kafka"
  config:
    connection:
      bootstrap: "${KAFKA_BOOTSTRAP_SERVER}"
      producer_config:
        security.protocol: SSL
        ssl.ca.location: /etc/service/keys/ca
        ssl.certificate.location: /etc/service/keys/cert
        ssl.key.location: /etc/service/keys/key
      schema_registry_url: "${KAFKA_SCHEMA_REGISTRY_URL}"
      schema_registry_config:
        basic.auth.user.info: "${KAFKA_SCHEMA_REGISTRY_USER}:${KAFKA_SCHEMA_REGISTRY_PW}"

Ingestion was stoped by an error like the one below (I added the REDACTED).

File "/usr/local/lib/python3.7/site-packages/datahub/entrypoints.py", line 98, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1137, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/datahub/entrypoints.py", line 85, in ingest
pipeline.run()
File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
for wu in self.source.get_workunits():
File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/lookml.py", line 585, in get_workunits
self.source_config.parse_table_names_from_sql,
File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/lookml.py", line 266, in from_looker_dict
"sql_table_name",
File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/lookml.py", line 381, in get_including_extends
f"failed to resolve extends view {extend} in view {view_name} of file {looker_viewfile.absolute_file_path}"

NameError: failed to resolve extends view REDACTED in view REDACTED of file /model-files/REDACTED/REDACTED.view.lkml

Using acryl-datahub[datahub-kafka,lookml]==0.8.6.0, that same problem resulted in a warning, but not an error. Like this:

WARNING {datahub.ingestion.source.lookml:339} - Skipping malformed view with view_name: REDACTED. View should have a single view in a view inheritance chain with a sql_table_name

I think it's desirable for a failure of this check to continue to trigger a warning, not a fatal error that stops metadata ingestion. Otherwise, I think it's possible for one "poison pill" .view.lkml file to break all of your LookML metadata ingestion.

@hsheth2
Copy link
Collaborator Author

hsheth2 commented Jul 22, 2021

@jameslamb I've updated the PR to handle those as warnings rather than crashing

@jameslamb
Copy link
Contributor

@jameslamb I've updated the PR to handle those as warnings rather than crashing

perfect, thanks! I'll test again right now.

@remisalmon
Copy link
Contributor

@hsheth2 just tested this patch on our lookml repo and it misses a lot of files during ingestion (about ~2/3 of the lookml views are missing vs what gets ingested with acryl-datahub[lookml]==0.8.6.1).

Those missing views are all defined at the root level of the repo.

@remisalmon
Copy link
Contributor

Re-tested after the last commits and all is working as expected for me.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit 90e05df into datahub-project:master Jul 22, 2021
@hsheth2 hsheth2 deleted the lookml-view-resolution branch November 10, 2022 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants