-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate source data freshness #1240
Labels
enhancement
New feature or request
Milestone
Comments
Closed
beckjake
added a commit
that referenced
this issue
Feb 13, 2019
…ness Feature: source freshness (#1240)
fixed in #1272 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature
Calculating Data Freshness
Using the information present in sources, dbt can determine how "fresh" source data is at a given point in time. dbt should provide a command which is capable of snapshotting the data freshness (the
max(loaded_at_field)
for each table) at a given point in time. dbt should produce a json file which contains information about the freshness when this command is invoked.Example usage
Arguments
--select
This flag allows users to select specific sources to describe. It should accept multiple values, each of which is either:
source
(eg.snowplow
,quickbooks
, etc)table
in a source (eg.snowplow.event
,quickbooks.accounts
). This name is generating by concatenating the source and table with a dot.If no sources are
--select
ed, then dbt should calculate the freshness for all of the sources in a project.-o
A path to a
.json
file (relative to thetarget/
directory?) to write the file to.Calculating Freshness
This query will vary in all of the usual, unfortunate ways across databases:
getdate()
,now()
,current_timestamp
, etc)timestamp_tz
vs.timestamp_ntz
on Snowflake)datediff
(namely on postgresAs such, this command should be implemented using the adapter macro paradigm. Moreover, it would be convenient to support a contract of fields in this query, then let users supply their own macro to calculate the time delta. This is a nice-to-have for the first cut of this feature, but if it's easy to do, we should do it!
Output file format
All times should be UTC
Stdout
This command should work a lot like the
dbt run
command, outputting a parallelized list of resource invocations to the console.The text was updated successfully, but these errors were encountered: