Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop trying to parse deleted schema files #9722

Merged
merged 4 commits into from
Mar 7, 2024

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Mar 1, 2024

resolves #8860

Problem

Waaaay back (in 7563b99) deleted schema files started being separated out from deleted non-schema files. However, ever since when it came to scheduling files for re-parsing, we've only done so for deleted non-schema files. We even missed this when we refactored the scheduling code in b37e5b5. This means when schema files got deleted and then the project went through partial parsing, you were likely to see an error like the following

05:53:11 Traceback (most recent call last):
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 246, in wrapper
    manifest = ManifestLoader.get_full_manifest(
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 316, in get_full_manifest
    manifest = loader.load()
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 492, in load
    self.parse_project(
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 659, in parse_project
    block = FileBlock(self.manifest.files[file_id])
KeyError: 'salesforce_opportunity://models/marts/core/metrics.yml'

We've over the years added more things you could define in schema files, which in turn means schema files have been changing (and getting deleted) more frequently. Thus we've been seeing this error crop up more and more.

Solution

Not scheduling deleted schema files for parsing (like we've always been doing for deleted non-schema files)

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

…etrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.
Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.
As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.
@QMalcolm QMalcolm requested a review from a team as a code owner March 1, 2024 21:56
@cla-bot cla-bot bot added the cla:yes label Mar 1, 2024
Copy link

codecov bot commented Mar 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.02%. Comparing base (fc43101) to head (9f22326).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9722      +/-   ##
==========================================
- Coverage   88.08%   88.02%   -0.07%     
==========================================
  Files         178      178              
  Lines       22419    22439      +20     
==========================================
+ Hits        19747    19751       +4     
- Misses       2672     2688      +16     
Flag Coverage Δ
integration 85.48% <100.00%> (-0.14%) ⬇️
unit 62.24% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@QMalcolm QMalcolm assigned QMalcolm and unassigned QMalcolm Mar 2, 2024
@QMalcolm QMalcolm merged commit deedeeb into main Mar 7, 2024
63 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--8860-key-error-sl-partial-parsing branch March 7, 2024 14:30
github-actions bot pushed a commit that referenced this pull request Mar 7, 2024
* Add test around deleting a YAML file containing semantic models and metrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.

* Skip deleted schema files when scheduling files during partial parsing

Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.

* Update `add_to_pp_files` to ignore `deleted_schema_files`

As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.

* Add changie doc for partial parsing KeyError fix

(cherry picked from commit deedeeb)
QMalcolm added a commit that referenced this pull request Mar 7, 2024
* Stop trying to parse deleted schema files (#9722)

* Add test around deleting a YAML file containing semantic models and metrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.

* Skip deleted schema files when scheduling files during partial parsing

Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.

* Update `add_to_pp_files` to ignore `deleted_schema_files`

As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.

* Add changie doc for partial parsing KeyError fix

(cherry picked from commit deedeeb)

* Empty commit to trigger github actions

---------

Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com>
Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants