Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Postgres : Fast query for estimate messages #21683

Merged
merged 7 commits into from
Jan 23, 2023
Merged

Conversation

akashkulk
Copy link
Contributor

As part of #20783, trace estimate messages were added to source-postgres.

One of the follow-up items #21499 identified in that PR was to use a fast query to calculate number of rows instead of select count(*) which can be slow. This was initially thought to not be a problem : see #20783 (comment), however we are seeing user reports of this query being slow : #20783 (comment).

In this PR, we migrate to the fast query. Initially, this was held off because invalid results were returned for smaller tables. But thinking more on this, we can skip estimate traces altogether for small tables (the sync will probably complete quickly anyways). This can be determined if the fast table row count estimate is something invalid (e.g. a negative value)

The only open question here is whether we should disable estimate trace messages for non-CDC incremental mode as I can't think of another way to get around this other than to issue a select count(*) where cursor_id > cursor_value query, which would have the same issues as a select count(*)

@octavia-squidington-iv octavia-squidington-iv added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/postgres labels Jan 20, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (3)

Connector Version Changelog Publish
source-alloydb 1.0.35
source-alloydb-strict-encrypt 1.0.35 🔵
(ignored)
🔵
(ignored)
source-postgres-strict-encrypt 1.0.39 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 20, 2023

/test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/3972129585
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/3972129585
No Python unittests run

Build Passed

Test summary info:

All Passed

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 20, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3972132514
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3972132514
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:22: `future_state` has a bypass reason, skipping.
================== 54 passed, 6 skipped in 717.50s (0:11:57) ===================

@akashkulk akashkulk temporarily deployed to more-secrets January 20, 2023 23:44 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 20, 2023 23:44 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2023

Airbyte Code Coverage

There is no coverage information present for the Files changed

Total Project Coverage 26.72% 🍏

@akashkulk akashkulk marked this pull request as ready for review January 21, 2023 00:07
@akashkulk akashkulk requested a review from a team as a code owner January 21, 2023 00:07
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me, but a question below

SELECT (SELECT COUNT(*) FROM %s) AS %s,
SELECT (select reltuples::int8 as count from pg_class c JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace where nspname='%s' AND relname='%s') AS %s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe both versions should be kept and done in a try/catch... I wonder if there are certain older versions of postgres that don't have a pg_catalog table?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2023

Platform Test Results

   231 files  ±0     231 suites  ±0   26m 12s ⏱️ + 1m 4s
1 606 tests ±0  1 595 ✔️ ±0  11 💤 ±0  0 ±0 
1 626 runs  ±0  1 615 ✔️ ±0  11 💤 ±0  0 ±0 

Results for commit 9973901. ± Comparison against base commit 6026465.

♻️ This comment has been updated with latest results.

@akashkulk akashkulk requested a review from evantahler January 21, 2023 00:43
@akashkulk akashkulk temporarily deployed to more-secrets January 21, 2023 00:56 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 21, 2023 00:56 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 22, 2023 23:18 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 22, 2023 23:18 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 23, 2023

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/3982262589


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 23, 2023

/publish connector=connectors/source-postgres run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/3982264139


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 23, 2023 00:34 — with GitHub Actions Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 23, 2023 00:34 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 23, 2023

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/3982325788


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk akashkulk temporarily deployed to more-secrets January 23, 2023 00:37 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 23, 2023 00:37 — with GitHub Actions Inactive
@akashkulk akashkulk merged commit 79de89d into master Jan 23, 2023
@akashkulk akashkulk deleted the fast_query branch January 23, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants