Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for staging #100

Merged
merged 17 commits into from
Oct 9, 2024
Merged

Add tests for staging #100

merged 17 commits into from
Oct 9, 2024

Conversation

amishas157
Copy link
Contributor

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the docs and README with the added features, breaking changes, new instructions on how to use the repository.

Release planning

  • I've decided if this PR requires a new major/minor/patch version accordingly to
    semver, and I've changed the name of the BRANCH to release/* , feature/* or patch/* .

What

[TODO: Short statement about what is changing.]

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

@amishas157 amishas157 marked this pull request as ready for review October 2, 2024 16:59
@amishas157 amishas157 requested a review from a team as a code owner October 2, 2024 16:59
Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment - How do the recency tests and freshness checks relate to each other? Are we running both freshness and recency tests on source?

If we are only running recency tests, I suggest we move the datepart and interval to something more aggressive so that we are alerted earlier to issues. In theory, the data should never be older than 20 min in the upstream tables.

My other nit is not null tests for asset_code and asset_issuer aren't super effective in my opinion. These fields are always blank for native asset, which is essentially null.

models/staging/stg_account_signers.yml Show resolved Hide resolved
models/staging/stg_claimable_balances.yml Outdated Show resolved Hide resolved
models/staging/stg_contract_code.yml Outdated Show resolved Hide resolved
models/staging/stg_contract_data.yml Outdated Show resolved Hide resolved
Comment on lines +15 to +17
- asset_code
- asset_issuer
- liquidity_pool_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you could use ledger_key instead of these three columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an expensive test and taking a while to run on 86B records. Wondering should run these tests only on newer data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 agree. Thank you for catching this. Generally, these tests will be expensive on state tables. This being one of the more expensive tests because of table size. Let's just run on newer data, almost in incremental mode until we come up with a better solution.

@amishas157
Copy link
Contributor Author

General comment - How do the recency tests and freshness checks relate to each other? Are we running both freshness and recency tests on source?

If we are only running recency tests, I suggest we move the datepart and interval to something more aggressive so that we are alerted earlier to issues. In theory, the data should never be older than 20 min in the upstream tables.

The plan is to have tests at two level:

  • Staging
    These tests are run whenever dbt build job is triggered. If the tests fails, the downstream jobs will be blocked and we will receive elementary alerts in slack.

  • Source
    These tests are run asynchronously on test and they are more aggressive and run on a frequency similar to history table export. This will help to catch issues early.

In this PR we are addressing staging part. sources are being separately. freshness checks and recency tests are almost similar looking for stale data to flag, the difference being recency tests are triggered as part of dbt build and freshness checks need to be called explicitly with dbt source freshness

My other nit is not null tests for asset_code and asset_issuer aren't super effective in my opinion. These fields are always blank for native asset, which is essentially null.

Will do

@sydneynotthecity
Copy link
Contributor

Thanks for the explanation, it helps clarify the differences between the two tests

Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, my only other comment is tests that perform full table scans may need adjustment to scan incremental only data. Especially on larger tables like, trust_lines, history_operations and history_transactions

models/staging/stg_config_settings.yml Outdated Show resolved Hide resolved
models/staging/stg_contract_code.yml Outdated Show resolved Hide resolved
models/staging/stg_contract_data.yml Outdated Show resolved Hide resolved
models/staging/stg_trust_lines.yml Show resolved Hide resolved
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch from 3267ba0 to aecc5c8 Compare October 4, 2024 17:45
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch 13 times, most recently from 89488b7 to a2553fa Compare October 4, 2024 20:43
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch 11 times, most recently from a5dc772 to 43f579c Compare October 4, 2024 22:01
update

update

update

update

update

lint okay?

update

update

why does it not ignore

update

update

identify

identify

check

was bool a prob

cheeck?

check

update

Revert

update

udpate

check

update

update

update

update
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch from 43f579c to 2b78a7c Compare October 4, 2024 22:30
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch from 9142e7d to 95e2ffe Compare October 4, 2024 23:08
Comment on lines 55 to 59
- incremental_accepted_values:
date_column_name: "closed_at"
greater_than_equal_to: "2 day"
values: ["credit_alphanum4", "credit_alphanum12", "native"]
quote: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove this test for now. Context in https://stellarorg.atlassian.net/browse/HUBBLE-574

@amishas157 amishas157 force-pushed the patch/source-quality-tests branch 2 times, most recently from fa93501 to 2cf340c Compare October 9, 2024 18:03
update

update

update deps
@amishas157 amishas157 force-pushed the patch/source-quality-tests branch from 2cf340c to 886670f Compare October 9, 2024 18:46
@amishas157 amishas157 merged commit b36e80f into master Oct 9, 2024
3 checks passed
@sydneynotthecity sydneynotthecity deleted the patch/source-quality-tests branch November 14, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants