Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3227] [Bug] KeyError: 'my_project://models/metrics.yml' after deleting YAML files related to semantic layer #8860

Closed
2 tasks done
Tracked by #9116
dbeatty10 opened this issue Oct 15, 2023 · 8 comments · Fixed by #9722
Closed
2 tasks done
Tracked by #9116
Assignees
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: SL partial_parsing
Milestone

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Oct 15, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Failure with long stacktrace after deleting YAML files related to semantic models and/or metrics.

Workaround

The easiest workaround is do a dbt clean:

dbt clean
More detailed workaround

Here's a workaround that may work in the meantime:

Use --no-partial-parse to disable partial parsing:

dbt --no-partial-parse compile

Alternatively, delete the partial_parse.msgpack file from your target directory (for example, via dbt clean).

Expected Behavior

Either no failure or at least no stacktrace.

Steps To Reproduce

  1. create a bunch of semantic models / metrics that reference each other
  2. parse project: dbt parse
  3. delete a bunch of them
  4. re-parse project: dbt parse
  5. get a KeyError
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 659, in parse_project
    block = FileBlock(self.manifest.files[file_id])
KeyError: 'salesforce_opportunity://models/marts/core/metrics.yml'

Relevant log output

05:53:09 Running with dbt=1.6.6
05:53:11 Registered adapter: snowflake=1.6.4
05:53:11 Encountered an error:
'salesforce_opportunity://models/marts/core/metrics.yml'
05:53:11 Traceback (most recent call last):
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/cli/requires.py", line 246, in wrapper
    manifest = ManifestLoader.get_full_manifest(
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 316, in get_full_manifest
    manifest = loader.load()
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 492, in load
    self.parse_project(
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/parser/manifest.py", line 659, in parse_project
    block = FileBlock(self.manifest.files[file_id])
KeyError: 'salesforce_opportunity://models/marts/core/metrics.yml'
05:53:09 'salesforce_opportunity://models/marts/core/metrics.yml'

Environment

- OS:
- Python:
- dbt: 1.6.6

Which database adapter are you using with dbt?

snowflake

Additional Context

Maybe be related (or a duplicate of):

@dbeatty10 dbeatty10 added bug Something isn't working triage labels Oct 15, 2023
@github-actions github-actions bot changed the title [Bug] KeyError: 'my_project://models/metrics.yml' after deleting YAML files related to semantic layer [CT-3227] [Bug] KeyError: 'my_project://models/metrics.yml' after deleting YAML files related to semantic layer Oct 15, 2023
@dbeatty10
Copy link
Contributor Author

dbeatty10 commented Oct 15, 2023

Workaround

The easiest workaround is do a dbt clean:

dbt clean
More detailed workaround

Here's a workaround that may work in the meantime:

Use --no-partial-parse to disable partial parsing:

dbt --no-partial-parse compile

Alternatively, delete the partial_parse.msgpack file from your target directory (for example, via dbt clean).

@dbeatty10
Copy link
Contributor Author

The key thing to triggering this error was defining a ratio metric, parsing the project, and then deleting the file it appears within:

metrics:
  - name: average_value_per_opportunity
    description: The average value per opportunity
    type: ratio
    label: Average Value per Opportunity
    type_params:
      numerator: total_value
      denominator: total_opportunities

Reprex

👉 This reproducible example assumes dbt-snowflake>=1.7.0-rc1. You'll need to adjust the metricflow_time_spine model in order to use dbt_utils to make it work for dbt ~=1.6.0.

seeds/orders.csv

"order_id","location_id","customer_id","order_total","tax_paid","ordered_at","count_food_items","count_drink_items","count_items","subtotal_drink_items","subtotal_food_items","subtotal","order_cost","location_name","is_food_order","is_drink_order"
"3d12eb16-1bbb-47ad-a732-bf156c322fa6","7f790ed7-0fc4-4de2-a1b0-cce72e657fc4","7cd5e7f3-39c8-48ce-950a-1b70449e7829",6.3600000000000000,0.36000000000000000000,2016-09-01 19:58:00.000,0,1,1,6.0000000000000000,0,6.0000000000000000,0.82000000000000000000,Philadelphia,false,true
"2c54134f-2322-4704-9f82-e2e9b77bcb19","7f790ed7-0fc4-4de2-a1b0-cce72e657fc4","96c8937f-d99c-4515-acec-db24d07768f9",50.8800000000000000,2.8800000000000000,2016-09-01 09:35:00.000,3,2,5,11.0000000000000000,37.0000000000000000,48.0000000000000000,10.32000000000000000000,Philadelphia,true,true

Create models/metricflow_time_spine.sql as described here:

{{ config(materialized='table') }}

with days as (

    {{
        dbt.date_spine(
            'day',
            "to_date('01/01/2000','mm/dd/yyyy')",
            to_date('2027-01-02')
        )
    }}

),

final as (
    select cast(date_day as date) as date_day
    from days
)

select * from final

models/semantic_models.yml

semantic_models:
  - name: orders_2
    model: ref('orders')
    description: |
      Order fact table. This table is at the order grain with one row per order.
    defaults:
      agg_time_dimension: ordered_at
    entities:
      - name: order_id
        type: primary
    dimensions:
      - name: ordered_at
        expr: ordered_at
        type: time
        type_params:
          time_granularity: day
    measures:
      - name: customers_with_orders_2
        description: Distinct count of customers placing orders
        agg: count_distinct
        expr: customer_id
      - name: total_value
        agg: sum
        expr: order_total
      - name: total_opportunities
        agg: count_distinct
        expr: customer_id

models/metrics.yml

metrics:
  - name: average_value_per_opportunity
    description: The average value per opportunities
    type: ratio
    label: Average Value per Opportunity
    type_params:
      numerator: total_value
      denominator: total_opportunities

  - name: total_value
    description: The total opportunity value for the business
    type: simple
    label: Total Value
    type_params:
      measure: total_value

  - name: total_opportunities
    description: The total opportunities for the business
    type: simple
    label: Total Opportunities
    type_params:
      measure: total_opportunities

At the end, your project folder will look like this:

├── dbt_project.yml
├── models
│   ├── metricflow_time_spine.sql
│   ├── metrics.yml
│   └── semantic_models.yml
└── seeds
    └── orders.csv

Run these set of commands to see the error:

dbt clean
dbt parse
mv models/metrics.yml models/metrics.yml.x
dbt parse

Here's the error I got:

15:27:46  Encountered an error:
'my_project://models/metrics.yml'
...
  File "/venv/snowflake_1.7/lib/python3.10/site-packages/dbt/parser/manifest.py", line 663, in parse_project
    block = FileBlock(self.manifest.files[file_id])
KeyError: 'my_project://models/metrics.yml'

Then run this to make the problem "go away":

dbt --no-partial-parse parse

@graciegoheen
Copy link
Contributor

@dbeatty10 Is this only affecting 1.6 and 1.7?

@dbeatty10
Copy link
Contributor Author

@dbeatty10 Is this only affecting 1.6 and 1.7?

I didn't check 1.5 since the reprex was using semantic_models that first appeared in 1.6.

@graciegoheen Do you think we should try out a similar non-MetricFlow ratio metric with 1.5 also?

@graciegoheen
Copy link
Contributor

Got it - let's contain this issue to semantic_models. I'll go ahead and add the 1.6 and 1.7 backport labels

@graciegoheen graciegoheen added backport 1.6.latest backport 1.7.latest High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Oct 23, 2023
@martynydbt martynydbt added this to the v1.8 milestone Feb 8, 2024
@QMalcolm
Copy link
Contributor

We had hoped this was resolved by some related work we did. However I just tried the reproduction with 1.7.latest and the error is still occurring.

@QMalcolm
Copy link
Contributor

QMalcolm commented Mar 1, 2024

The error specifically happens when a yaml file which contains definitions for both semantic models and metrics is deleted

@QMalcolm
Copy link
Contributor

QMalcolm commented Mar 1, 2024

But also it only does it sometimes...

QMalcolm added a commit that referenced this issue Mar 1, 2024
…etrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.
QMalcolm added a commit that referenced this issue Mar 7, 2024
* Add test around deleting a YAML file containing semantic models and metrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.

* Skip deleted schema files when scheduling files during partial parsing

Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.

* Update `add_to_pp_files` to ignore `deleted_schema_files`

As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.

* Add changie doc for partial parsing KeyError fix
github-actions bot pushed a commit that referenced this issue Mar 7, 2024
* Add test around deleting a YAML file containing semantic models and metrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.

* Skip deleted schema files when scheduling files during partial parsing

Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.

* Update `add_to_pp_files` to ignore `deleted_schema_files`

As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.

* Add changie doc for partial parsing KeyError fix

(cherry picked from commit deedeeb)
QMalcolm added a commit that referenced this issue Mar 7, 2024
* Stop trying to parse deleted schema files (#9722)

* Add test around deleting a YAML file containing semantic models and metrics

It was raised in #8860 that an
error is being raised during partial parsing when files containing
metrics/semantic models are deleted. In further testing it looks like this
error specifically happens when a file containing both semantic models and
metrics is deleted. If the deleted file contains just semantic models or
metrics there seems to be no issue. The next commit should contain the fix.

* Skip deleted schema files when scheduling files during partial parsing

Waaaay back (in 7563b99) deleted schema files started being separated out
from deleted non-schema files. However ever since, when it came to scheduling
files for reparsing, we've only done so for deleted non-schema files. We even
missed this when we refactored the scheduling code in b37e5b5. This change
updates `_schedule_for_parsing` which is used by `schedule_nodes_for_parsing`
to begin skipping deleted schema files in addition to deleted non schema files.

* Update `add_to_pp_files` to ignore `deleted_schema_files`

As noted in the previous commit, we started separating out deleted
schema files from deleted non-schema files a looong time ago. However,
this whole time we've been adding `deleted_schema_files` to the list
of files to be parsed. This change corrects for that.

* Add changie doc for partial parsing KeyError fix

(cherry picked from commit deedeeb)

* Empty commit to trigger github actions

---------

Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com>
Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: SL partial_parsing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants