-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech Spec for Allow checking access to destination schema in replication config UI (check-with-catalog) #20561
Comments
cc @airbytehq/frontend |
Alternative SolutionFor CHECK commands which run as part of a sync (because the previous sync attempt has failed - #13459), we provide the Pros:
Cons:
|
As a note, I find this problem really interesting because this is the first instance I've seen where the configuration of the source can materially effect the success of the destination. I think to-date we've generally assumed that Source and Destination configurations are fully independent of each other, and the sync configuration independent of both. The protocol lacks any sort of "compatibility ACK" between source, destination, sync. |
Also, @grishick - asking the lazy product question: Do we see users who replicate multiple schemas, and if we do, are there regularly conflicting table names? Perhaps "mirror source" doesn't need to imply the schemas as well? |
Alternative Solution... maybe in the docs we say we need an "admin"-level user who can write to all schemas. |
A common pattern is for a workspace to have multiple sources configured to write to a single destination with schema name specified in the replication config. We can say "admin" level in the docs, but technically we don't need that, so I hesitate to make this a requirement. I am OK with changing the signature of the existing
I want to run |
Conflicting names sometimes come up on OC, but I cannot say that that's a regular occasion. However, it is usual for users to specify target schema name in replication configuration. If that schema already exists and the destination connector does not have permissions to write to it, we will detect that only during the second attempt of the first sync. |
Yes. We often see users replicating multiple sources into the same destination (that's kind of the point of a data warehouse). I have seen several occasions where stream names collide at destinations (especially, streams like "accounts", "devices"). Sometimes, users figure it out by themselves, and add prefixes, sometimes this requires handholding, often they just give up. It would be even better if |
It seems like the #19998 is also related to this one |
This small issue will make the problem less painful in the meantime: #21030 |
Notes from grooming: There are several trade-offs and alternatives to consider here. Someone from the destinations team will need to sit down with connector-ops and platform and write a tech spec for this change. |
Notes from a conversation with @grishick: I (@evantahler) prefer modifying the existing CHECK method to take an optional second argument
We can then use the existence (or not) of the second |
As a note, when talking about this project, it's probably important to socialize the learning that at least for destinations, CHECK cannot be done in isolation (source information matters). |
Scoping this to writing a tech spec |
Removing team labels other than |
Posted the tech spec for review |
An outcome of the tech spec review was that we chose to update the existing |
Closing this issue as the tech spec has been completed and approved. Tickets have been created representing the milestones in the tech spec |
Problem Statement
Currently, we allow users to test their destination configuration only when they are setting up the destination. However, replication configuration can cause destinations to fail as well. The specific problem that we have observed so far is that Snowflake destination is configured with some default schema and we check for access to that schema as well as for ability to create new schemas during connector setup. If the user then sets up replication and selects the "Mirror source structure" or "Namespace custom format" in "Destination Namespace" options, Snowflake destination can end up trying to write to a schema that it does not have access to. This was observed in several customer accounts when syncing data from Postgres source where tables are in schema "public". The problem is reported as
Furthermore, this problem cannot currently be caught in
CHECK
method, becauseCHECK
does not have access to source schema or replication configuration. We already do CHECK for ability to create new schema, so the problem happens only when the target schema already exists and we don’t have permissions to write to it and we don’t know the name of the target schema until the sync startsDesired solution
For better UX, I suggest that we add a "check" button to Replication page and add a new call to the protocol, which will allow the connectors (or at least the destination connector) to check if the replication configuration is correct, not just if the connector configuration is correct.
Doing so will also require having a new type of check method (
checkReplication
?) that takes replication options and source schema as arguments in addition to destination configuration.Current workaround
Currently, the problem is only detected after the failure of the first sync, which has a negative affect on first experience and activation.
The text was updated successfully, but these errors were encountered: