-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Postgres : Fast query for estimate messages #21683
Conversation
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
1.0.35 |
✅ | ✅ |
source-alloydb-strict-encrypt |
1.0.35 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres-strict-encrypt |
1.0.39 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
/test connector=connectors/source-postgres-strict-encrypt
Build PassedTest summary info:
|
/test connector=connectors/source-postgres
Build PassedTest summary info:
|
Airbyte Code Coverage
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from me, but a question below
SELECT (SELECT COUNT(*) FROM %s) AS %s, | ||
SELECT (select reltuples::int8 as count from pg_class c JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace where nspname='%s' AND relname='%s') AS %s, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe both versions should be kept and done in a try/catch... I wonder if there are certain older versions of postgres that don't have a pg_catalog
table?
/publish connector=connectors/source-postgres-strict-encrypt run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
As part of #20783, trace estimate messages were added to source-postgres.
One of the follow-up items #21499 identified in that PR was to use a fast query to calculate number of rows instead of
select count(*)
which can be slow. This was initially thought to not be a problem : see #20783 (comment), however we are seeing user reports of this query being slow : #20783 (comment).In this PR, we migrate to the fast query. Initially, this was held off because invalid results were returned for smaller tables. But thinking more on this, we can skip estimate traces altogether for small tables (the sync will probably complete quickly anyways). This can be determined if the fast table row count estimate is something invalid (e.g. a negative value)
The only open question here is whether we should disable estimate trace messages for non-CDC incremental mode as I can't think of another way to get around this other than to issue a
select count(*) where cursor_id > cursor_value
query, which would have the same issues as aselect count(*)