Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres dv2: add indexes #34225

Closed
wants to merge 13 commits into from
Closed

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Jan 12, 2024

see https://github.com/airbytehq/airbyte-internal-issues/issues/2423 - does not close it b/c that ticket also covers mysql.

based on top of #34177 (at some point I really should try out graphite...)

We'll need to manually drop the existing test tables after releasing this (the existing JdbcDestinationHandler can't detect a change in indexes, and there's no table-change detection logic for raw tables). That's OK, because we haven't released the beta yet, so we're the only users.

Also minorly refactors the final table index creation code - instead of exposing a createIndex method in SqlGenerator, just have PostgresSqlGenerator override createTable to append more statements. I think that's better (reduces the scope of the base JdbcSqlGenerator class)? But open to reverting it if people disagree.

@edgao edgao requested a review from a team as a code owner January 12, 2024 19:08
Copy link

vercel bot commented Jan 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 12, 2024 11:30pm

Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@edgao edgao force-pushed the edgao/dv2/postgres/indexes branch from 92bab23 to a5b75fd Compare January 12, 2024 21:14
edgao added 2 commits January 12, 2024 13:19
# Conflicts:
#	airbyte-integrations/connectors/destination-bigquery/build.gradle
#	airbyte-integrations/connectors/destination-redshift/build.gradle
#	airbyte-integrations/connectors/destination-snowflake/build.gradle
#	docs/integrations/destinations/bigquery.md
#	docs/integrations/destinations/snowflake.md
@evantahler
Copy link
Contributor

... should JdbcDestinationHandler detect index changes? We got that working in snowflake. It seems like a feature we'll want in the future maybe?

Comment on lines 29 to 40
@Override
protected List<String> postCreateTableQueries(final String schemaName, final String tableName) {
if (TypingAndDedupingFlag.isDestinationV2()) {
return List.of(
"CREATE INDEX IF NOT EXISTS " + tableName + "_raw_id" + " ON " + tableName + "(_airbyte_raw_id)",
"CREATE INDEX IF NOT EXISTS " + tableName + "_extracted_at" + " ON " + tableName + "(_airbyte_extracted_at)",
"CREATE INDEX IF NOT EXISTS " + tableName + "_loaded_at" + " ON " + tableName + "(_airbyte_loaded_at, _airbyte_extracted_at)"
);
} else {
return Collections.emptyList();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any way of specifying an index in the CREATE TABLE syntax.

TIL!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are all sorts if indexes we could be creating (Btree, hash, etc), but I think "unspecified" index type like this is the right thing to do, as it will fallback to the default for the database/schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be reasonable to make the index on _raw_id UNIQUE, but we make this field and we can trust UUID to be unique, so I don't think we need this validation. Adding a UNIQUE index would also make writes slower, and we don't want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are good thoughts which I definitely did not think :P

I'll copypaste it into our code as an actual comment for easier future reference

@edgao
Copy link
Contributor Author

edgao commented Jan 12, 2024

should JdbcDestinationHandler detect index changes

I think eventually yes, but not literally today. If any beta users want us to change things indexes, I feel ok asking them to manually drop all their tables? Having autodetections feels like a "after we release to GA and we want to change our indexes" feature

it's probably not a huge lift (something like fetch jdbc metadata -> dump into the TableDefinition struct -> diff appropriately) and is a pure bolt-on addition in the future, so I feel ok deferring it for now?

@edgao
Copy link
Contributor Author

edgao commented Jan 12, 2024

ffs. bad merge. closing this pr and opening a new one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants