Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): ensure that LookML files are always parsed in the same order #2966

Merged
merged 2 commits into from
Jul 29, 2021
Merged

fix(ingest): ensure that LookML files are always parsed in the same order #2966

merged 2 commits into from
Jul 29, 2021

Conversation

jameslamb
Copy link
Contributor

Metadata ingestion for LookML requires a few operations that list files using glob.glob(). This PR proposes sorting those lists of files by name, to ensure that given the same set of .lkml files, datahub ingest will process them in the same order every time.

The ordering of results from glob.glob() is not guaranteed to be the same on different filesystems. See https://stackoverflow.com/a/6773661 for more details.

How this improves datahub ingest

I faced a really difficult bug today. I created a docker image with an entrypoint that ran something like datahub ingest -c /recipes/recipe.yaml.

I found that I got different behavior in these two methods of running containers using that image:

  • in a container on my laptop
    •  git clone https://lookml-files.git
       docker run -v $(pwd)/lookml-files:/model-files my-ingestion-image:${TAG}
  • starting a pod in k8s using git-sync to clone from lookml-files.git and mount it into a container at /model-files

Eventually, I discovered a separate bug where the order of parsing mattered (proposed fix in #2965 ). A strong guarantee that datahub ingest processes files in the same order every time would have sped up my debugging of that issue, and would have given me higher confidence that ingestion running in k8s would match the behavior I saw in my development environment.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Thanks for your time and consideration!

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @jameslamb :)

@shirshanka shirshanka merged commit acffd8f into datahub-project:master Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants