[EPIC] Support schema selection in database and warehouse source connectors #2781

davinchia · 2021-04-07T10:17:24Z

Tell us about the problem you're trying to solve

We currently try to discover all schemas in a database when discovering a source's schema. This can lead to issues if a source has too many schema tables. i.e. the catalog becomes too big and cannot be saved in our database.

This led to this #2619.

Definitely a nice-to-have rather than a must have.

Describe the solution you’d like

Expose some sort of schema regex so users can specify what they want included in the discover job.

Describe the alternative you’ve considered or used

Allow users to specify tables to sync in addition to schemas.

TODOs

Create separate issue for each of the high priority source connectors. Replace the ❌ with the issue ID.
DRY the abstract JDBC class, and extract the schema selection logic (ref: comment).

Source Connector	Status	Priority	Note
Postgres	✅	-
Oracle	✅	-
SQL Server	❌	High
Redshift	#9525	High
Snowflake	❌	High
BigQuery	❌	High	The schema equivalent concept in BigQuery is datasets.
DB2	❌	Low
CockroachDB	❌	Low
Clickhouse	-	-	Clickhouse does not support schema.
MySQL	-	-	MySQL does not support schema.
MongoDB	-	-	MondoDB does not support schema.

Deninc · 2021-06-14T04:46:16Z

I'm evaluating Airbyte and our source DB has too many tables, it caused "io.temporal.failure.ServerFailure: Complete result exceeds size limit.".

IMO this is the key feature for us to continue using Airbyte.

cgardens · 2021-06-15T22:12:46Z

@Deninc which database are you using?

Deninc · 2021-06-16T01:52:37Z

@cgardens I'm using Oracle.

OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.

ogirardot · 2021-11-26T11:23:29Z

Hi there, I think I have the same kind of issue using the latest (0.1.7) MongoDB source, there are a lot of connections and the timeout of 1h is reached with no way to discover the schema or use any kind of fallback.
Is there any countermeasure in the meantime ?

cgardens · 2021-12-03T19:25:39Z

@tuliren for visibility. i'm not sure what priority this should be against other db issues but just wanted to make sure you saw it.

alafanechere · 2022-01-13T15:18:57Z

A workaround could be to suggest our user to create a mongo user dedicated to Airbyte, and only discover collections on which the mongo user has reading privilege.

maharshi-zluri · 2022-01-14T03:27:07Z

@alafanechere I absolutely agree. It would save us a lot of time in the future and make having connection processes much smoother.

tuliren · 2022-01-14T21:58:42Z

@cgardens I'm using Oracle.

OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.

@Deninc, by the way, Oracle connector does support schema specification since version 0.3.3.

Amphagory · 2022-09-29T23:40:02Z

Was there a Snowflake work around for this problem? Is there a way to increase the BlobSizeLimitError via a argument or in the config file?

cgardens · 2024-07-24T21:55:29Z

done

davinchia added the type/enhancement New feature or request label Apr 7, 2021

alafanechere mentioned this issue Jan 13, 2022

Source MongoDB: only discover collections on which user has reading permissions #9482

Closed

tuliren added area/connectors Connector related issues area/databases area/warehouses Epic labels Jan 14, 2022

tuliren changed the title ~~Allow configuring what schema to 'discover'.~~ [EPIC] Support schema selection in database and warehouse source connectors Jan 14, 2022

tuliren mentioned this issue Jan 15, 2022

Support schema selection for Redshift source connector #9525

Closed

tuliren mentioned this issue Jan 25, 2022

🎉Source-redshift: added an optional field for schema\s selection #9721

Merged

35 tasks

bleonard added autoteam team/extensibility labels Apr 26, 2022

sherifnada added team/databases and removed team/extensibility labels May 3, 2022

igrankova added the connectors/source/cockroachdb label May 20, 2022

grishick added team/db-dw-sources Backlog for Database and Data Warehouse Sources team and removed connectors/source/cockroachdb autoteam labels Sep 27, 2022

grishick removed the team/databases label Oct 7, 2022

bleonard added the frozen Not being actively worked on label Mar 22, 2024

cgardens closed this as completed Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Support schema selection in database and warehouse source connectors #2781

[EPIC] Support schema selection in database and warehouse source connectors #2781

davinchia commented Apr 7, 2021 •

edited by tuliren

Loading

Deninc commented Jun 14, 2021

cgardens commented Jun 15, 2021

Deninc commented Jun 16, 2021

ogirardot commented Nov 26, 2021

cgardens commented Dec 3, 2021

alafanechere commented Jan 13, 2022

maharshi-zluri commented Jan 14, 2022

tuliren commented Jan 14, 2022

Amphagory commented Sep 29, 2022

cgardens commented Jul 24, 2024

[EPIC] Support schema selection in database and warehouse source connectors #2781

[EPIC] Support schema selection in database and warehouse source connectors #2781

Comments

davinchia commented Apr 7, 2021 • edited by tuliren Loading

Tell us about the problem you're trying to solve

Describe the solution you’d like

Describe the alternative you’ve considered or used

TODOs

Deninc commented Jun 14, 2021

cgardens commented Jun 15, 2021

Deninc commented Jun 16, 2021

ogirardot commented Nov 26, 2021

cgardens commented Dec 3, 2021

alafanechere commented Jan 13, 2022

maharshi-zluri commented Jan 14, 2022

tuliren commented Jan 14, 2022

Amphagory commented Sep 29, 2022

cgardens commented Jul 24, 2024

davinchia commented Apr 7, 2021 •

edited by tuliren

Loading