Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2407] [Bug] dbt can't find snapshots using their filenames if mixed cased but only if inside a subfolder #7346

Closed
2 tasks done
jeremyyeo opened this issue Apr 13, 2023 · 3 comments
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Apr 13, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

This one is really strange - for some reason dbt can't find snapshots via their mixedCased file names if inside a subfolder.

Note for readers of this issue - you are encouraged to follow snake_case when naming stuff (for example: dbt-labs/docs.getdbt.com#1659) which would avoid a scenario like this.

Expected Behavior

dbt should find snappy_2 / snaPPy_2.sql just like it did for model bAr.sql.

Steps To Reproduce

Project setup:

# dbt_project.yml
name: my_dbt_project
profile: snowflake
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table

Nodes (pay close attention to the filenames):

-- models/in_sub/bAr.sql
select 1 as id

-- snapshots/snaPPy_1.sql
{% snapshot snappy_1 %}
{{ config(target_schema='snapshots', unique_key='id', strategy='check', check_cols='all') }}
select 1 as id
{% endsnapshot %}

-- snapshots/in_subfolder/snaPPy_2.sql
{% snapshot snappy_2 %}
{{ config(target_schema='snapshots', unique_key='id', strategy='check', check_cols='all') }}
select 1 as id
{% endsnapshot %}

-- snapshots/in_subfolder/snappy_3.sql
{% snapshot snappy_3 %}
{{ config(target_schema='snapshots', unique_key='id', strategy='check', check_cols='all') }}
select 1 as id
{% endsnapshot %}
$ dbt ls -s bAr
my_dbt_project.in_sub.bAr

$ dbt ls -s snaPPy_1
my_dbt_project.snaPPy_1.snappy_1

$ dbt ls -s snaPPy_2
01:43:49  The selection criterion 'snaPPy_2' does not match any nodes
01:43:49  No nodes selected!

$ dbt ls -s snappy_3
my_dbt_project.in_subfolder.snappy_3.snappy_3

Relevant log output

No response

Environment

- OS: macOS
- Python: 
Python 3.10.10
- dbt:
Core:
  - installed: 1.4.5
  - latest:    1.4.5 - Up to date!

Plugins:
  - snowflake: 1.4.2 - Up to date!

Which database adapter are you using with dbt?

snowflake

Additional Context

For more info on why this was brought up - basically the Cloud IDE uses the syntax:

2+<active_file_name>+2

To draw the lineage on an active file/node. Due to this issue - the snapshot snaPPy_2.sql / snappy_2 isn't rendered in the Cloud IDE when it is the active file - unless you tweak the selection to be the string "snappy_2" (as opposed to it's default selection when you open up the file "2+snaPPy_2+2") - then it's okay.

Internal JIRA: https://dbtlabs.atlassian.net/browse/XP-2094

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Apr 13, 2023
@github-actions github-actions bot changed the title [Bug] dbt can't find snapshots using their filenames if mixed cased but only if inside a subfolder [CT-2407] [Bug] dbt can't find snapshots using their filenames if mixed cased but only if inside a subfolder Apr 13, 2023
@dbeatty10 dbeatty10 self-assigned this Apr 13, 2023
@dbeatty10
Copy link
Contributor

I was able to reproduce what you reported @jeremyyeo -- as always, thank you for such a helpful reprex 🤩

tl;dr

Check out wildcard selection syntax for Unix-style globbing!

$ dbt ls -s '*.snaPPy_2.*'
my_dbt_project.in_subfolder.snaPPy_2.snappy_2

More detail

Reading your description initially, it looked like case-sensitivity was somehow coming into play. But after renaming each of the snapshot files, it looks like it is something different than that (see reprex below).

Specifically, node names are the same as filenames for models, but this is not the case for snapshots. It's somewhat accidental that dbt ls -s snaPPy_1 worked in your example due to how snapshot filenames are translated into fully-qualified names (FQN) for nodes.

Reprex

Reprex

First, I updated most of the file names to be both lowercase and distinct from the snapshot name:

mv snapshots/snaPPy_1.sql snapshots/snappy_one.sql
mv snapshots/in_subfolder/snaPPy_2.sql snapshots/in_subfolder/snappy_two.sql
mkdir -p snapshots/in_subfolder/in_subsubfolder/
mv snapshots/in_subfolder/snappy_three.sql snapshots/in_subfolder/in_subsubfolder/snappy_three.sql

Then took a look at a variety of selection syntax options:

$ dbt ls -s in_sub      
my_dbt_project.in_sub.bAr

$ dbt ls -s in_subfolder
my_dbt_project.in_subfolder.snappy_two.snappy_2
my_dbt_project.in_subfolder.snappy_three.snappy_3

$ dbt ls -s bAr
my_dbt_project.in_sub.bAr

$ dbt ls -s snappy_1    
my_dbt_project.snappy_one.snappy_1

$ dbt ls -s snappy_2
my_dbt_project.in_subfolder.snappy_two.snappy_2

$ dbt ls -s snappy_one  
my_dbt_project.snappy_one.snappy_1

$ dbt ls -s snappy_two
16:10:30  The selection criterion 'snappy_two' does not match any nodes
16:10:30  No nodes selected!

$ dbt ls -s in_subsubfolder
16:24:35  The selection criterion 'in_subsubfolder' does not match any nodes
16:24:35  No nodes selected!

$ dbt ls -s in_subfolder.in_subsubfolder
my_dbt_project.in_subfolder.in_subsubfolder.snappy_three.snappy_3

# wildcard selection syntax
$ dbt ls -s '*.snappy_two.*'
my_dbt_project.in_subfolder.snappy_two.snappy_2

# file selection syntax
$ dbt ls -s snapshots/in_subfolder/snappy_two.sql 
my_dbt_project.in_subfolder.snappy_two.snappy_2

I think the selection for snapshots is behaving as expected in this case, and I'm inclined to close this in favor of using the wildcard selection syntax or file selection syntax instead. Before I close, are there any key considerations I might be overlooking?

@dbeatty10 dbeatty10 removed their assignment Apr 13, 2023
@markproctor1
Copy link

@dbeatty10 I didn't understand your last comment when you said "before I close" do you mean close the bug report?

We noticed this behavior because the snapshots are showing up in the DAG sometimes and not others. We suspected it had something to do with subfolders but couldn't nail down what (we're very new to dbt).

Are you saying if they're snake case it works so it's not a bug?

@dbeatty10
Copy link
Contributor

@markproctor1 We're going to do an update to fix this for you, we're just going to make that update in our Cloud software (which we track in Jira tickets) rather than Core (which we track in GitHub issues like this one).

The root cause is when the filename differs from the name of the snapshot. In the example below, if you change the filename to be the same as the snapshot name, it should work:

-- snapshots/in_subfolder/oops_this_name_is_different.sql
{% snapshot this_is_the_snapshot_name %}
{{ config(target_schema='snapshots', unique_key='id', strategy='check', check_cols='all') }}
select 1 as id
{% endsnapshot %}

I'm going to close this issue, but please don't think we are closing this altogether: our Cloud team still has visibility on this.

And please feel free to ask any follow-up questions -- although this issue is labeled as "closed", we're listening and will respond 😄

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 15, 2023
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

3 participants