Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Support schema selection in database and warehouse source connectors #2781

Closed
2 tasks
davinchia opened this issue Apr 7, 2021 · 10 comments
Closed
2 tasks
Labels
area/connectors Connector related issues area/databases area/warehouses Epic frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/enhancement New feature or request

Comments

@davinchia
Copy link
Contributor

davinchia commented Apr 7, 2021

Tell us about the problem you're trying to solve

We currently try to discover all schemas in a database when discovering a source's schema. This can lead to issues if a source has too many schema tables. i.e. the catalog becomes too big and cannot be saved in our database.

This led to this #2619.

Definitely a nice-to-have rather than a must have.

Describe the solution you’d like

Expose some sort of schema regex so users can specify what they want included in the discover job.

Describe the alternative you’ve considered or used

Allow users to specify tables to sync in addition to schemas.

TODOs

  • Create separate issue for each of the high priority source connectors. Replace the ❌ with the issue ID.
  • DRY the abstract JDBC class, and extract the schema selection logic (ref: comment).
Source Connector Status Priority Note
Postgres -
Oracle -
SQL Server High
Redshift #9525 High
Snowflake High
BigQuery High The schema equivalent concept in BigQuery is datasets.
DB2 Low
CockroachDB Low
Clickhouse - - Clickhouse does not support schema.
MySQL - - MySQL does not support schema.
MongoDB - - MondoDB does not support schema.
@davinchia davinchia added the type/enhancement New feature or request label Apr 7, 2021
@Deninc
Copy link

Deninc commented Jun 14, 2021

I'm evaluating Airbyte and our source DB has too many tables, it caused "io.temporal.failure.ServerFailure: Complete result exceeds size limit.".

IMO this is the key feature for us to continue using Airbyte.

@cgardens
Copy link
Contributor

@Deninc which database are you using?

@Deninc
Copy link

Deninc commented Jun 16, 2021

@cgardens I'm using Oracle.

OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.

@ogirardot
Copy link

Hi there, I think I have the same kind of issue using the latest (0.1.7) MongoDB source, there are a lot of connections and the timeout of 1h is reached with no way to discover the schema or use any kind of fallback.
Is there any countermeasure in the meantime ?

@cgardens
Copy link
Contributor

cgardens commented Dec 3, 2021

@tuliren for visibility. i'm not sure what priority this should be against other db issues but just wanted to make sure you saw it.

@alafanechere
Copy link
Contributor

A workaround could be to suggest our user to create a mongo user dedicated to Airbyte, and only discover collections on which the mongo user has reading privilege.

@maharshi-zluri
Copy link

@alafanechere I absolutely agree. It would save us a lot of time in the future and make having connection processes much smoother.

@tuliren
Copy link
Contributor

tuliren commented Jan 14, 2022

@cgardens I'm using Oracle.

OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.

@Deninc, by the way, Oracle connector does support schema specification since version 0.3.3.

@tuliren tuliren changed the title Allow configuring what schema to 'discover'. [EPIC] Support schema selection in database and warehouse source connectors Jan 14, 2022
@grishick grishick added team/db-dw-sources Backlog for Database and Data Warehouse Sources team and removed connectors/source/cockroachdb autoteam labels Sep 27, 2022
@Amphagory
Copy link

Was there a Snowflake work around for this problem? Is there a way to increase the BlobSizeLimitError via a argument or in the config file?

@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
@cgardens
Copy link
Contributor

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/databases area/warehouses Epic frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests