Provide source data as JSON blob #199

jaypeedevlin · 2022-09-13T03:23:34Z

Migrating @ahmedrad's comment from #175 where he said:

just to add our 2 cents as new users to this package. For us, we use star schema modelling but there's still quite a few differences between our warehouse and the models provided by the package and so we decided to do our own modelling so we can stay consistent across the warehouse. I'm not sure that's something you'd ever be able to get away from. For example using TIMESTAMP_NTZ vs TIMESTAMP_TZ types, prefix vs suffix naming dim_ vs dim, shortened naming vs not fct vs fact_, column naming conventions etc...

it would be pretty difficult to provide high level models that would immediately mesh with any existing data warehouse without any manipulation. I would much rather have access to the entire json blob of high level objects so that I can model however I want to for my needs. For example if I consider dbt to be broken into sources, models, tests, macros, exposures and metrics then for every run I'd want to have a table for each that contains the raw json blob of the object with some associated metadata about the run it was developed in.

That way we'll have all the flexibility we need in extracting and modelling the json objects however we need to and in a way that matches the rest of the warehouse without having to deal with most of the manifest structure. The blobs of the specific objects may change often and we'll have to deal with that but the main objects that dbt consists off shouldn't change all that often even with manifest schema changes and so it'll spare you having to re-work your tables often.

and I'd rather get access to your tables as sources when I install the package, because that's what they essentially are, a source for dbt run data that still needs to be staged and modelled in our warehouse according to our conventions

I agree that semantically the data is a source, the reality is that in order to support improvements that add columns like your PR without maintaining schema migrations will be difficult and so in (the as yet unreleased) #188 we opted to make these models. Of the top of my head there's nothing to stop you defining them as sources in your project should you prefer that.

For the other models, you could easily disable them per the below:

# v1.2.0 dbt_project.yml

models:
  dbt_artifacts:
    +enabled: false

# upcoming-release dbt_project.yml

models:
  dbt_artifacts:
    +enabled: false
    sources:
      +enabled: true

I'm keen to hear if you have any thoughts on the above, but my intention with creating this issue is to have a place to discuss adding a JSON column into each row that contains as much data as possible (including what's currently available) so that someone like yourself can model it as they wish. This is, by the way, similar to how pre 1.0 versions worked, so you might be interested in taking a look at running an earlier version, assuming you're on Snowflake.

One thing we'd have to consider is what would happen to this column for warehouses without first-class JSON support (eg Redshift, which is not yet supported but I imagine will be soon).

ahmedrad · 2022-10-06T22:35:10Z

I've tested only pulling in the sources as you describe above and defining them as dbt sources in our project and layering our own modelling. It works well. Now this would just need to add the JSON blob column to the different models so it's available to pull whatever we need out of it without having to depend on the package extracting it for us

glsdown mentioned this issue Mar 31, 2023

Provide source data as json blob #280

Merged

glsdown added the enhancement New feature or request label Apr 28, 2023

glsdown closed this as completed in #280 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide source data as JSON blob #199

Provide source data as JSON blob #199

jaypeedevlin commented Sep 13, 2022

ahmedrad commented Oct 6, 2022

Provide source data as JSON blob #199

Provide source data as JSON blob #199

Comments

jaypeedevlin commented Sep 13, 2022

ahmedrad commented Oct 6, 2022