Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _raw suffix to non-normalized destination table names #874

Merged
merged 8 commits into from
Nov 10, 2020

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Nov 10, 2020

What

  • Adds the _raw suffix to all non-normalized table names in the destination
  • writes tables in the same schema/dataset as the table being normalized.

I'm pretty unhappy with this implementation and think we should remove it going forward, but putting it up to minimize delivery risk. Some better ways to do this are having DBT rename the raw tables in a pre-hook to add the _raw suffix, or having the destination integration output a "writtenCatalog" which contains the names of the tables it actually wrote. The latter is better in the case that we want to stop overwriting tables when performing full refresh syncs and instead write to a different table e.g table_timestamp.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know the goal is to get this in right now so feel free to skip this suggestion for now. but i think your life gets a whole lot easier if you just had a helper method that looks like this

  public String getRawTableName(String streamName) {
    return streamName + "_raw";
  }

it saves us needing put "_raw" all over the codebase. ideally that string should be written in exactly one place. (at least in java)

maybe also a function that maps (rawTableName) => streamname.

anyway. go for it.

@@ -35,6 +35,4 @@ quoting:
# are materialized, and more!
models:
airbyte:
# Schema (dataset) defined in profiles.yml is concatenated with schema below for dbt's final output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this suffixed the schema/dataset with _NORMALIZED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay.

@sherifnada
Copy link
Contributor Author

@cgardens good suggestion - implemented

@sherifnada sherifnada merged commit 9305461 into master Nov 10, 2020
@sherifnada sherifnada deleted the sherif/same-normalization-schema branch November 10, 2020 16:31
@jrhizor jrhizor mentioned this pull request Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants