-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Connectors using an index yaml file, instead of individual json files #1046
Conversation
|
||
while (elements.hasNext()) { | ||
final JsonNode element = Jsons.clone(elements.next()); | ||
final String name = element.get("name").asText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR comment at the top about how much this class should assume about the structure of the input objects.
} | ||
names.add(name); | ||
|
||
final UUID uuid = UUID.nameUUIDFromBytes(name.getBytes(Charsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm on the fence on this. i like the idea of being able to generate the uuids deterministically from the configs in some way. the issue is that if someone ever changes the name of a connector then the uuid will change. right now i don't think that causes any immediate problems, but it could be really confusing in our version migration story. if a connector name ever changes we need to make sure its uuid gets propagated to any already persisted config that uses the old one. a few options... not in love with any of these:
- name field is never allowed to change. if a display name needs to change we add that as an optional attribute. (bad because not immediately intuitive to a contributor, easy to mess up.)
- generate uuid based off of position in the list. if an integration is ever removed then a placeholder must be left in the list. (bad because not immediately intuitive to a contributor, and cluttering up the list with placeholders seems not great, easy to mess up)
- user still needs to generate and hardcode a uuid, but they do it in the yaml and don't need to worry about the file name and the uuid in the object in the file being the same. keeps the yaml which is a bit easier to work with. we still validate that all uuids are unique (still generating uuids is annoying, but ultimately this is the most bullet proof solution, so if we can't think of something better, it is probably the right option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about we use the image name for the uuid? (airbyte/destination-bigquery
) ?
It gives us two things:
- if someone tweaks the integrations and publishes it on their image repo, it automatically gets a new uuid
- it is deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this. It is the same problem as using name though. It means we better never change the underlying image name that we are using! I think the cases where that might be something we do is in cases like the file source, were we have considered exposing it at multiple separate sources but all with the same image. there are things we could do to tweak the name, but it seems like weird incidental complexity to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that so then we should have the uuid in the yaml. That gives us the best of both world for the small cost of generating a random uuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name as more chances to change because it is just a matter of modifying an input whereas the docker repo is a bit more involve. so whichever we choose we should pick the one that is the most stable.
But in the end nothing beats forcing the uuid in the yaml
} | ||
names.add(name); | ||
|
||
final UUID uuid = UUID.nameUUIDFromBytes(name.getBytes(Charsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about we use the image name for the uuid? (airbyte/destination-bigquery
) ?
It gives us two things:
- if someone tweaks the integrations and publishes it on their image repo, it automatically gets a new uuid
- it is deterministic
@@ -0,0 +1,16 @@ | |||
- name: Local CSV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this.
@@ -0,0 +1,64 @@ | |||
- name: "exchangeratesapi.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this.
@@ -34,3 +34,5 @@ dependencies { | |||
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) | |||
// integrationTestImplementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) | |||
} | |||
|
|||
compileJava.dependsOn(':airbyte-integrations:connectors:source-mysql:build') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a hack, we should at least have a comment explaining why and how to get rid of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be here.
2792379
to
bfa9f0d
Compare
* #1046 source twilio: fix generating date ranges * #1046 source twilio - upd changelog * #1046 source twilio: add freezegun to requirements * #1046 source twilio: fix incremental tests * #1046 source twilio: fix expected records * #1046 source twilio - fix expected records * #1046 source twilio: cut expected records * #1046 source twilio: cut expected records * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Note: Number of files in this PR looks scary, but it is really only 5 files to review. The rest is just changes in generated files.
What
airbyte-config:init
following an exacting format. This includes them needing to generate a unique uuid themselves. That uuid must be placed in the file and also be the name of the file. This burden leads to bug and is unintuitive to a contributor. It is an artifact of the records being written in that directory being in the format that is used by the database.How
Open Questions
RepositorySeed
. It pretends that it is just stupidly adding uuids to objects that are passed to it. But it also assumes that a name attribute on those objects. Either it should be configurable how it determines how to generate its UUID, or it should just go ahead and directly reference source and destination. I think we can probably make the former work.Checklist
Recommended reading order
airbyte-config/init/src/main/java/io/airbyte/config/init/RepositorySeed.java
airbyte-config/init/src/main/resources/seed/source_definitions.yaml
airbyte-config/init/src/main/resources/seed/destination_definitions.yaml
airbyte-config/init/build.gradle