Add tests for staging #100

amishas157 · 2024-09-26T11:19:56Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with the jira ticket associated with the PR.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated the docs and README with the added features, breaking changes, new instructions on how to use the repository.

Release planning

I've decided if this PR requires a new major/minor/patch version accordingly to
semver, and I've changed the name of the BRANCH to release/* , feature/* or patch/* .

What

[TODO: Short statement about what is changing.]

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

sydneynotthecity

General comment - How do the recency tests and freshness checks relate to each other? Are we running both freshness and recency tests on source?

If we are only running recency tests, I suggest we move the datepart and interval to something more aggressive so that we are alerted earlier to issues. In theory, the data should never be older than 20 min in the upstream tables.

My other nit is not null tests for asset_code and asset_issuer aren't super effective in my opinion. These fields are always blank for native asset, which is essentially null.

models/staging/stg_account_signers.yml

models/staging/stg_claimable_balances.yml

models/staging/stg_contract_code.yml

models/staging/stg_contract_data.yml

sydneynotthecity · 2024-10-03T02:16:00Z

models/staging/stg_trust_lines.yml

+            - asset_code
+            - asset_issuer
+            - liquidity_pool_id


nit: I think you could use ledger_key instead of these three columns

This looks like an expensive test and taking a while to run on 86B records. Wondering should run these tests only on newer data?

+1 agree. Thank you for catching this. Generally, these tests will be expensive on state tables. This being one of the more expensive tests because of table size. Let's just run on newer data, almost in incremental mode until we come up with a better solution.

amishas157 · 2024-10-03T17:21:48Z

General comment - How do the recency tests and freshness checks relate to each other? Are we running both freshness and recency tests on source?

If we are only running recency tests, I suggest we move the datepart and interval to something more aggressive so that we are alerted earlier to issues. In theory, the data should never be older than 20 min in the upstream tables.

The plan is to have tests at two level:

Staging
These tests are run whenever dbt build job is triggered. If the tests fails, the downstream jobs will be blocked and we will receive elementary alerts in slack.
Source
These tests are run asynchronously on test and they are more aggressive and run on a frequency similar to history table export. This will help to catch issues early.

In this PR we are addressing staging part. sources are being separately. freshness checks and recency tests are almost similar looking for stale data to flag, the difference being recency tests are triggered as part of dbt build and freshness checks need to be called explicitly with dbt source freshness

My other nit is not null tests for asset_code and asset_issuer aren't super effective in my opinion. These fields are always blank for native asset, which is essentially null.

Will do

sydneynotthecity · 2024-10-04T16:26:32Z

Thanks for the explanation, it helps clarify the differences between the two tests

sydneynotthecity

Few nits, my only other comment is tests that perform full table scans may need adjustment to scan incremental only data. Especially on larger tables like, trust_lines, history_operations and history_transactions

models/staging/stg_config_settings.yml

models/staging/stg_contract_code.yml

models/staging/stg_contract_data.yml

models/staging/stg_trust_lines.yml

remove quote

update update update update update lint okay? update update why does it not ignore update update identify identify check was bool a prob cheeck? check update Revert update udpate check update update update update

stringify

amishas157 · 2024-10-08T20:13:30Z

models/staging/stg_contract_data.yml

+          - incremental_accepted_values:
+              date_column_name: "closed_at"
+              greater_than_equal_to: "2 day"
+              values: ["credit_alphanum4", "credit_alphanum12", "native"]
+              quote: true


Need to remove this test for now. Context in https://stellarorg.atlassian.net/browse/HUBBLE-574

update update update deps

amishas157 added 6 commits September 26, 2024 16:49

Add tests for staging

e5bd46b

lint

2c43f93

update col names

3476196

update tests

0ef36d0

Remove recency test fot config setting

addc863

error when staging model fails for recency tests

b13c166

amishas157 marked this pull request as ready for review October 2, 2024 16:59

amishas157 requested a review from a team as a code owner October 2, 2024 16:59

sydneynotthecity reviewed Oct 3, 2024

View reviewed changes

rework some tests based on feedback

e41f9a2

sydneynotthecity approved these changes Oct 4, 2024

View reviewed changes

models/staging/stg_config_settings.yml Outdated Show resolved Hide resolved

models/staging/stg_contract_code.yml Outdated Show resolved Hide resolved

models/staging/stg_contract_data.yml Outdated Show resolved Hide resolved

models/staging/stg_trust_lines.yml Show resolved Hide resolved

Run test incrementally

aecc5c8

remove quote

amishas157 force-pushed the patch/source-quality-tests branch from 3267ba0 to aecc5c8 Compare October 4, 2024 17:45

amishas157 added 2 commits October 4, 2024 13:22

Support quote

a31b47d

Remove recency test for contract code

faa9ce9

amishas157 force-pushed the patch/source-quality-tests branch 13 times, most recently from 89488b7 to a2553fa Compare October 4, 2024 20:43

amishas157 force-pushed the patch/source-quality-tests branch 11 times, most recently from a5dc772 to 43f579c Compare October 4, 2024 22:01

update

2b78a7c

update update update update update lint okay? update update why does it not ignore update update identify identify check was bool a prob cheeck? check update Revert update udpate check update update update update

amishas157 force-pushed the patch/source-quality-tests branch from 43f579c to 2b78a7c Compare October 4, 2024 22:30

conditional interval

95e2ffe

stringify

amishas157 force-pushed the patch/source-quality-tests branch from 9142e7d to 95e2ffe Compare October 4, 2024 23:08

sydneynotthecity approved these changes Oct 7, 2024

View reviewed changes

amishas157 commented Oct 8, 2024

View reviewed changes

Remove acceptable value test for stg_contract_data

59309b0

amishas157 force-pushed the patch/source-quality-tests branch 2 times, most recently from fa93501 to 2cf340c Compare October 9, 2024 18:03

update sqlfluff

886670f

update update update deps

amishas157 force-pushed the patch/source-quality-tests branch from 2cf340c to 886670f Compare October 9, 2024 18:46

amishas157 added 3 commits October 9, 2024 16:33

ignore specific test

e97de28

exclude specific test

eba2b8e

revert all changes made for sqlfluff

b34da8a

amishas157 merged commit b36e80f into master Oct 9, 2024
3 checks passed

sydneynotthecity deleted the patch/source-quality-tests branch November 14, 2024 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for staging #100

Add tests for staging #100

amishas157 commented Sep 26, 2024

sydneynotthecity left a comment

sydneynotthecity Oct 3, 2024

amishas157 Oct 3, 2024

sydneynotthecity Oct 4, 2024

amishas157 commented Oct 3, 2024

sydneynotthecity commented Oct 4, 2024

sydneynotthecity left a comment

amishas157 Oct 8, 2024

Add tests for staging #100

Add tests for staging #100

Conversation

amishas157 commented Sep 26, 2024

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

sydneynotthecity left a comment

Choose a reason for hiding this comment

sydneynotthecity Oct 3, 2024

Choose a reason for hiding this comment

amishas157 Oct 3, 2024

Choose a reason for hiding this comment

sydneynotthecity Oct 4, 2024

Choose a reason for hiding this comment

amishas157 commented Oct 3, 2024

sydneynotthecity commented Oct 4, 2024

sydneynotthecity left a comment

Choose a reason for hiding this comment

amishas157 Oct 8, 2024

Choose a reason for hiding this comment