-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Support schema selection in database and warehouse source connectors #2781
Comments
I'm evaluating Airbyte and our source DB has too many tables, it caused "io.temporal.failure.ServerFailure: Complete result exceeds size limit.". IMO this is the key feature for us to continue using Airbyte. |
@Deninc which database are you using? |
@cgardens I'm using Oracle. OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience. |
Hi there, I think I have the same kind of issue using the latest (0.1.7) MongoDB source, there are a lot of connections and the timeout of 1h is reached with no way to discover the schema or use any kind of fallback. |
@tuliren for visibility. i'm not sure what priority this should be against other db issues but just wanted to make sure you saw it. |
A workaround could be to suggest our user to create a mongo user dedicated to Airbyte, and only discover collections on which the mongo user has reading privilege. |
@alafanechere I absolutely agree. It would save us a lot of time in the future and make having connection processes much smoother. |
@Deninc, by the way, Oracle connector does support schema specification since version |
Was there a Snowflake work around for this problem? Is there a way to increase the BlobSizeLimitError via a argument or in the config file? |
done |
Tell us about the problem you're trying to solve
We currently try to discover all schemas in a database when discovering a source's schema. This can lead to issues if a source has too many schema tables. i.e. the catalog becomes too big and cannot be saved in our database.
This led to this #2619.
Definitely a nice-to-have rather than a must have.
Describe the solution you’d like
Expose some sort of schema regex so users can specify what they want included in the discover job.
Describe the alternative you’ve considered or used
Allow users to specify tables to sync in addition to schemas.
TODOs
The text was updated successfully, but these errors were encountered: