Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide source data as JSON blob #199

Closed
jaypeedevlin opened this issue Sep 13, 2022 · 1 comment · Fixed by #280
Closed

Provide source data as JSON blob #199

jaypeedevlin opened this issue Sep 13, 2022 · 1 comment · Fixed by #280
Labels
enhancement New feature or request

Comments

@jaypeedevlin
Copy link
Contributor

Migrating @ahmedrad's comment from #175 where he said:

just to add our 2 cents as new users to this package. For us, we use star schema modelling but there's still quite a few differences between our warehouse and the models provided by the package and so we decided to do our own modelling so we can stay consistent across the warehouse. I'm not sure that's something you'd ever be able to get away from. For example using TIMESTAMP_NTZ vs TIMESTAMP_TZ types, prefix vs suffix naming dim_ vs dim, shortened naming vs not fct vs fact_, column naming conventions etc...

it would be pretty difficult to provide high level models that would immediately mesh with any existing data warehouse without any manipulation. I would much rather have access to the entire json blob of high level objects so that I can model however I want to for my needs. For example if I consider dbt to be broken into sources, models, tests, macros, exposures and metrics then for every run I'd want to have a table for each that contains the raw json blob of the object with some associated metadata about the run it was developed in.

That way we'll have all the flexibility we need in extracting and modelling the json objects however we need to and in a way that matches the rest of the warehouse without having to deal with most of the manifest structure. The blobs of the specific objects may change often and we'll have to deal with that but the main objects that dbt consists off shouldn't change all that often even with manifest schema changes and so it'll spare you having to re-work your tables often.

and I'd rather get access to your tables as sources when I install the package, because that's what they essentially are, a source for dbt run data that still needs to be staged and modelled in our warehouse according to our conventions

I agree that semantically the data is a source, the reality is that in order to support improvements that add columns like your PR without maintaining schema migrations will be difficult and so in (the as yet unreleased) #188 we opted to make these models. Of the top of my head there's nothing to stop you defining them as sources in your project should you prefer that.

For the other models, you could easily disable them per the below:

# v1.2.0 dbt_project.yml

models:
  dbt_artifacts:
    +enabled: false
# upcoming-release dbt_project.yml

models:
  dbt_artifacts:
    +enabled: false
    sources:
      +enabled: true

I'm keen to hear if you have any thoughts on the above, but my intention with creating this issue is to have a place to discuss adding a JSON column into each row that contains as much data as possible (including what's currently available) so that someone like yourself can model it as they wish. This is, by the way, similar to how pre 1.0 versions worked, so you might be interested in taking a look at running an earlier version, assuming you're on Snowflake.

One thing we'd have to consider is what would happen to this column for warehouses without first-class JSON support (eg Redshift, which is not yet supported but I imagine will be soon).

@ahmedrad
Copy link
Contributor

ahmedrad commented Oct 6, 2022

I've tested only pulling in the sources as you describe above and defining them as dbt sources in our project and layering our own modelling. It works well. Now this would just need to add the JSON blob column to the different models so it's available to pull whatever we need out of it without having to depend on the package extracting it for us

@glsdown glsdown added the enhancement New feature or request label Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants