Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get columns in relation #197

Merged
merged 3 commits into from
Jul 28, 2021

Conversation

ali-tny
Copy link
Contributor

@ali-tny ali-tny commented Jul 23, 2021

resolves #196 / this slack thread

Description

get_columns_in_relation fails when it's called on a model that was created in the same run.

At the start of the run, the cache is populated, setting Relation.information to be the string output of a DESCRIBE EXTENDED query, which allows columns and metadata to be parsed.

However, when models are created, a Relation with information=None is saved in the cache (since columns and metadata aren't returned from a CREATE TABLE / VIEW statement). This means that an expected string or bytes-like object error is raised when attempting to regex-parse None.

This change updates get_columns_in_relation to fall back to actually querying the schema when the information attribute isn't set for a relation (in addition to the current behaviour which falls back if a relation isn't in the cache at all)

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

ali-tny added 2 commits July 23, 2021 11:35
Specifically, it fails when it's called on a model that was created in
the same run.

At the start of the run, the cache is populated, setting
Relation.information to be the string output of a DESCRIBE EXTENDED
query, which allows columns and metadata to be parsed.

However, when models are created, a Relation with information=None is
saved in the cache (since columns and metadata aren't returned from a
CREATE TABLE / VIEW statement). This means that an `expected string or
bytes-like object` error is raised when attempting to regex-parse None.
If the `information` attribute is not yet set, we fall back on the
non-cached version to find column information. We could _also_ cache the
output of that query, but given that it wasn't cached originally, I
leave it as it is.
@cla-bot cla-bot bot added the cla:yes label Jul 23, 2021
@ali-tny ali-tny force-pushed the fix-get-columns-in-relation branch from ee27c50 to 183557e Compare July 23, 2021 11:10
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a ton of sense @ali-tny. Thanks for the quick work pulling together this fix, and for adding the test!

I'm going to merge this, move the changelog entry up to v0.20.1, and then cherry-pick onto 0.20.latest.

@jtcohen6 jtcohen6 merged commit b315008 into dbt-labs:master Jul 28, 2021
jtcohen6 pushed a commit that referenced this pull request Jul 28, 2021
* Add test for failing get_columns_in_relation

Specifically, it fails when it's called on a model that was created in
the same run.

At the start of the run, the cache is populated, setting
Relation.information to be the string output of a DESCRIBE EXTENDED
query, which allows columns and metadata to be parsed.

However, when models are created, a Relation with information=None is
saved in the cache (since columns and metadata aren't returned from a
CREATE TABLE / VIEW statement). This means that an `expected string or
bytes-like object` error is raised when attempting to regex-parse None.

* Only parse cols from cache if there's information

If the `information` attribute is not yet set, we fall back on the
non-cached version to find column information. We could _also_ cache the
output of that query, but given that it wasn't cached originally, I
leave it as it is.

* Add get_columns_in_relation fix to CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_columns_in_relation fails on the first run, when called on a model created in that run
2 participants