-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bmoric/add namespace to protocol #13356
Conversation
@@ -106,6 +106,9 @@ definitions: | |||
type: string | |||
state: | |||
"$ref": "#/definitions/AirbyteStateBlob" | |||
namespace: | |||
description: Optional Source-defined namespace. Currently only used by JDBC destinations to determine what schema to write to. Airbyte streams from the same sources should have the same namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only used by JDBC destinations to determine what schema to write to.
I'm not sure if we want to put this in the protocol, as more connectors could start using this in the future.
Airbyte streams from the same sources should have the same namespace.
Is this always true? I'm not sure if it exists in prod, but I believe we have tests where a single source has multiple namespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be re-worded to address the original intent and @lmossman's suggestion:
Examples include JDBC destinations, which use this field to determine the destination schema...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that the change is in progress and the use of it might change as well. I will just keep it at optional source defined namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: looks like format
needs to be run to fix the connectors-base build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the format & comment is fixed up, this looks good to me!
There may be a spot to talk about this on AirbyteStateMessage, but it's not required.
A question: When looking at the storage designed for per-stream state messages (doc), why was the choice to store both the stream_name and namespace rather than a "combined" name like name (string) = "${namespace}:#{stream_name}"
? I'm a fan for clear column storage, but it may be possible to change the /meaning/ of AirbyteStreamState.name to accomplish the goal without updating the protocol - if that were a goal.
This approach feels more prone to error to me, as it requires everything implementing the protocol to know about this special format, and to only use it in the case where there is a namespace. I think having the namespace be an explicit optional field is a clearer interface |
@@ -106,6 +106,9 @@ definitions: | |||
type: string | |||
state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add the following to the AirbyteStreamState object:
required:
- name
- state
to make it clear that those two properties are always required, and namespace
is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
yes it is more error prone. This was also an organization that got use in other part of the protocol (AirbyteCatalog). We don't want any discrepancy here |
to piggyback on what lake and benoit said. because we allow a stream name to include any utf8 characters there is no separator we can use without getting into character escaping hell. separate fields avoids that, but it is admittedly more verbose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
f450a6f
to
adb912a
Compare
@benmoriceau is there a reason we are doing name and namespace as top-level fields as opposed to a stream key object? Michel had mentioned it and I think it's a solid idea. Curious how we picked the current format. |
…-namespace-to-protocol
What
update the protocol in order to add a namespace in association with the stream name.
There was a field that was missing in the change that happened to the state message in the protocol. This was about the need of a namespace field to make the stream name schema specific. This is related to this discussion