Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1986] [Feature] add packages to manifest.json #6818

Closed
3 tasks done
noel opened this issue Jan 31, 2023 · 5 comments
Closed
3 tasks done

[CT-1986] [Feature] add packages to manifest.json #6818

noel opened this issue Jan 31, 2023 · 5 comments
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)

Comments

@noel
Copy link

noel commented Jan 31, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When comparing manifests for slim CI, it may be that the user happens to have a different version of a package installed than the one used when the manifest was originally generated.

The only way to see that this is the case is to see the code for the offending macro for the given package. If the packages and their versions were stored in the manifest then they could be compared across different manifests.

Describe alternatives you've considered

Comparing the contents of two manifests.

Who will this benefit?

Anyone troubleshooting problems with state:modified.

Are you interested in contributing this feature?

No response

Anything else?

No response

@noel noel added enhancement New feature or request triage labels Jan 31, 2023
@github-actions github-actions bot changed the title [Feature] add packages to manifest.json [CT-1986] [Feature] add packages to manifest.json Jan 31, 2023
@dbeatty10
Copy link
Contributor

@noel thanks for opening this !

I'm guessing that both this issue and the following arose in the context of the same troubleshooting session?

Reaction to your proposal

As-is, I don't think the we'd actually have the necessary version information when the manifest is constructed.

dbt knows the content within your packages-install-path (which is dbt_packages by default), but it actually doesn't know where it came from. Any of the commands within the dbt build family (run, test, etc) are naive to the contents of packages.yml (and the specific versions installed by dbt deps). There's a proverbial wall in between packages.yml and dbt_packages, and dbt deps is the party doing the tossing over that wall.

But... there's some recent discussion that could facilitate your proposal:

Specifically, one of the things discussed is the dbt deps process producing an artifact that contains a listing of the versions that were installed. It would be named something like packages.json. When it exists, such an artifact could be picked up and re-written into the manifest (possibly as a top-level packages key).

What's your gut reaction to eventually writing package versions into manifest.json @jtcohen6 ? You got that appetite? Or anything about it cause some indigestion?

@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Feb 1, 2023
@jtcohen6 jtcohen6 added the state Stateful selection (state:modified, defer) label Feb 1, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 1, 2023

@dbeatty10 Spot on, on all counts.

To be clear, the full content of packages is stored in manifest.json already, and we do calculate their diffs (including macros) during state:modified comparison. What I hear this issue asking for is: A user-friendlier message saying, "Hey, there might be a lot of diffs, because we're looking at two different versions of this package!"

I know we recently experienced this with our own internal analytics project, while upgrading to dbt-utils v1.0. The set of state:modified+ was... well, just about every model! (cc @epapineau)

Related: a very cool thread in #dbt-core-development, from @z3z1ma, proposing a similar UX improvement to the explainability of why certain nodes are detected as "modified." I think that would also go a long way toward helping folks troubleshooting, by making it possible to isolate & surface root cause of the modification.

As far as the specific ask:

What's your gut reaction to eventually writing package versions into manifest.json @jtcohen6 ? You got that appetite? Or anything about it cause some indigestion?

I don't have a concern here, we just don't have a natural place to stick this info today. (The metadata dict is the closest I can think of.) Maybe a new packages key?

@jtcohen6 jtcohen6 removed the Refinement Maintainer input needed label Feb 1, 2023
@noel
Copy link
Author

noel commented Feb 1, 2023

I have since gotten more information and I don't think my feature request would solve this problem.
I have a project that is using dbt_snow_mask version 0.2.0

My initial theory was that someone modified the version in their packages.yml and uploaded a manifest.json which behaved differently for them than for others.

Since most people were detecting changes I had the people who did not detect changes delete their packages and rerun dbt deps Once they did this, they too saw the differences

Upon further investigation I found this
https://github.com/entechlog/dbt-snow-mask/commits/0.2.0

So, my guess is that the maintainer made a change June 1, 2022 and did not change the version of the package. So, if someone had installed the package prior to that date and the prd manifest was uploaded from their machine, then anyone who installs the package after that date would get the new commit and thus see all the state:modified changes.

In the commit we see a new variable added to their dbt_project.yml and we do know that indeed it changed when the user reran dbt deps because they sent me a screenshot.

So, merely storing the version of the package would not suffice if the version is kept the same.

Not sure much can be done in this scenario.

Before
image

After
image

@noel
Copy link
Author

noel commented Feb 1, 2023

i think #6643 is the answer to my request. We can close this one. I left a comment there.

@dbeatty10
Copy link
Contributor

I'm not advocating for the following, just exploring the art of the possible.

Currently, the manifest does contain version metadata for dbt_schema_version and dbt_version. However, I didn't see the actual versions used for:

  1. the adapter (snowflake), or
  2. the package (entechlog/dbt_snow_mask)

Maybe we do want one or both of these as they would affect the contents within the manifest?

Diff for a manifest containing more version numbers

Let's pretend for a moment that #6735 is merged as-is. When running dbt deps it would create this file:

dbt_packages/installed_packages.json

{
  "packages": {
    "entechlog/dbt_snow_mask": {
      "source": "hub",
      "version": "0.1.9"
    }
  }
}

And then imagine that this content is dropped as-is into manifest.json during dbt build, etc:

{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/manifest/v8.json",
    "dbt_version": "1.4.1",
    ...
    "adapter_type": "snowflake"
  },
  "packages": {
    "entechlog/dbt_snow_mask": {
      "source": "hub",
      "version": "0.1.9"
    }
  },

Then after upgrading dbt_snow_mask to 0.2.3, re-running deps and build, and then comparing the before/after manifests, you could* get a diff that begins like this:

*** /dev/fd/12	2023-02-01 11:59:27.000000000 -0700
--- /dev/fd/13	2023-02-01 11:59:27.000000000 -0700
***************
*** 11,17 ****
      "packages": {
          "entechlog/dbt_snow_mask": {
              "source": "hub",
!             "version": "0.1.9"
          }
      },
      "nodes": {
--- 11,17 ----
      "packages": {
          "entechlog/dbt_snow_mask": {
              "source": "hub",
!             "version": "0.2.3"
          }
      },
      "nodes": {
* Click for the exact methods used to diff

Added this parameterized alias to the end of ~/.zshrc:

mdiff() {
  diff -c \
  <(grep -v -e created_at -e generated_at -e invocation_id <(python -m json.tool $1)) \
  <(grep -v -e created_at -e generated_at -e invocation_id <(python -m json.tool $2))
}

Re-sourced to load that new mdiff alias to do a "manifest diff":

source ~/.zshrc`

Then did the actual diff like:

mdiff manifest-1.json manifest-2.json

Next steps

Per request, I'll close this request. We can always re-open it again if it becomes relevant.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

3 participants