You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
just to add our 2 cents as new users to this package. For us, we use star schema modelling but there's still quite a few differences between our warehouse and the models provided by the package and so we decided to do our own modelling so we can stay consistent across the warehouse. I'm not sure that's something you'd ever be able to get away from. For example using TIMESTAMP_NTZ vs TIMESTAMP_TZ types, prefix vs suffix naming dim_ vs dim, shortened naming vs not fct vs fact_, column naming conventions etc...
it would be pretty difficult to provide high level models that would immediately mesh with any existing data warehouse without any manipulation. I would much rather have access to the entire json blob of high level objects so that I can model however I want to for my needs. For example if I consider dbt to be broken into sources, models, tests, macros, exposures and metrics then for every run I'd want to have a table for each that contains the raw json blob of the object with some associated metadata about the run it was developed in.
That way we'll have all the flexibility we need in extracting and modelling the json objects however we need to and in a way that matches the rest of the warehouse without having to deal with most of the manifest structure. The blobs of the specific objects may change often and we'll have to deal with that but the main objects that dbt consists off shouldn't change all that often even with manifest schema changes and so it'll spare you having to re-work your tables often.
and I'd rather get access to your tables as sources when I install the package, because that's what they essentially are, a source for dbt run data that still needs to be staged and modelled in our warehouse according to our conventions
I agree that semantically the data is a source, the reality is that in order to support improvements that add columns like your PR without maintaining schema migrations will be difficult and so in (the as yet unreleased) #188 we opted to make these models. Of the top of my head there's nothing to stop you defining them as sources in your project should you prefer that.
For the other models, you could easily disable them per the below:
I'm keen to hear if you have any thoughts on the above, but my intention with creating this issue is to have a place to discuss adding a JSON column into each row that contains as much data as possible (including what's currently available) so that someone like yourself can model it as they wish. This is, by the way, similar to how pre 1.0 versions worked, so you might be interested in taking a look at running an earlier version, assuming you're on Snowflake.
One thing we'd have to consider is what would happen to this column for warehouses without first-class JSON support (eg Redshift, which is not yet supported but I imagine will be soon).
The text was updated successfully, but these errors were encountered:
I've tested only pulling in the sources as you describe above and defining them as dbt sources in our project and layering our own modelling. It works well. Now this would just need to add the JSON blob column to the different models so it's available to pull whatever we need out of it without having to depend on the package extracting it for us
Migrating @ahmedrad's comment from #175 where he said:
I agree that semantically the data is a source, the reality is that in order to support improvements that add columns like your PR without maintaining schema migrations will be difficult and so in (the as yet unreleased) #188 we opted to make these models. Of the top of my head there's nothing to stop you defining them as sources in your project should you prefer that.
For the other models, you could easily disable them per the below:
I'm keen to hear if you have any thoughts on the above, but my intention with creating this issue is to have a place to discuss adding a JSON column into each row that contains as much data as possible (including what's currently available) so that someone like yourself can model it as they wish. This is, by the way, similar to how pre 1.0 versions worked, so you might be interested in taking a look at running an earlier version, assuming you're on Snowflake.
One thing we'd have to consider is what would happen to this column for warehouses without first-class JSON support (eg Redshift, which is not yet supported but I imagine will be soon).
The text was updated successfully, but these errors were encountered: