Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter field to source table definitions (#1495) #1776

Merged
merged 3 commits into from
Sep 26, 2019

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Sep 23, 2019

Fixes #1495

Added a filter parameter to source table definitions. It defaults to None. If provided, a where {{ filter }} clause is added to the source freshness calculation select statement.

I also added some tests, did some mypy-related cleanup. While writing the tests, I needed to define some sort of reasonable behavior for when the source freshness query returns no data - I decided that we'll treat your data as if the latest update was on January 1, year 1 at 00:00:00 UTC. Hopefully that's outside your source freshness range. We could alternatively raise an exception about there being no data, but I thought the idea of your data being considered very out of date was more coherent.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technically look very good to me - the mypy additions here are great, and I like the approach you implemented for making empty source freshness results appear as "stale".

My one holdup is that a top-level filter: config on the source probably isn't the right way to specify this field. I'd expect a field like that to work globally, but it really only contributes to source freshness queries at the moment. In #1495, there's also a comment about how schema tests don't work properly for tables that require filters. I don't think we should try to address that problem right now - that's going to involve changing schema test logic, and it can be worked around in user-space.

Can we change the filter field to live inside of the freshness: block? I think that will assuage my concerns around the placement of the filter: field and will better clarify to users what it actually does. I'll follow up in #1494 with some approaches (outside the scope of this PR) for adding filters in schema tests on BigQuery.

{% macro default__collect_freshness(source, loaded_at_field) %}
{% call statement('check_schema_exists', fetch_result=True, auto_begin=False) -%}
{% macro default__collect_freshness(source, loaded_at_field, filter) %}
{% call statement('collect_freshness', fetch_result=True, auto_begin=False) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Jacob Beck added 2 commits September 24, 2019 13:41
Added some type annotations
Clean up some mypy issues around the "available" decorators
@beckjake beckjake force-pushed the feature/partition-filters-sources branch from 489f631 to 55a6b9a Compare September 24, 2019 19:45
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - ship it!

@beckjake beckjake merged commit 23484b1 into dev/louisa-may-alcott Sep 26, 2019
@beckjake beckjake deleted the feature/partition-filters-sources branch September 26, 2019 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow partition filters to be passed to source freshness checks
2 participants