Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-config: Implement RemoteDefinitionsProvider #16018

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Aug 26, 2022

What

https://github.com/airbytehq/airbyte-cloud/issues/2402
Epic: https://github.com/airbytehq/airbyte-cloud/issues/2399

To decouple cloud deployments from connector catalog updates we need Airbyte's bootloader to fetch connector configuration (a.k.a definitions) from a remote source. (A JSON file ATM)

How (updated after iterations)

  1. Create a DefinitionsProvider interface
  2. Create RemoteDefinitionsProvider class which retrieves a remote catalog and exposes the definitions with its interface
  3. Create LocalDefinitionsProvider class whose internal logic of parsing YAML seed file is similar to YamlSeedConfigPersistence
  4. Create an adapter class (DefinitionsProviderToConfigPersistence) that can be used in the bootloader to avoid immediately refactoring the bootloader to use the new DefinitionsProvider interface.
  5. Create a new env variable (REMOTE_CONNECTOR_CATALOG_URL) (not used at the moment)

Recommended reading order

  1. DefinitionsProvider.java
  2. RemoteDefinitionsProvider.java
  3. LocalDefinitionsProvider.java
  4. DefinitionProviderToConfigPersistenceAdapterTest.java
  5. EnvConfigs.java
  6. Configs.java

🚨 User Impact 🚨

No impact: new classes are not used yet.

@alafanechere alafanechere temporarily deployed to more-secrets August 26, 2022 16:53 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets August 29, 2022 08:10 Inactive
@alafanechere alafanechere force-pushed the augustin/config/implement-remote-catalog-config-persistence branch from 8360d40 to b0e2572 Compare August 29, 2022 13:18
@alafanechere alafanechere temporarily deployed to more-secrets August 29, 2022 13:20 Inactive
@alafanechere alafanechere marked this pull request as ready for review August 29, 2022 13:32
@alafanechere alafanechere requested review from a team and pedroslopez August 29, 2022 13:33
@alafanechere alafanechere temporarily deployed to more-secrets August 29, 2022 13:41 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets August 29, 2022 13:53 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets August 29, 2022 17:10 Inactive
* This config persistence pulls the connector configurations from a remotely hosted catalog. It is
* read-only.
*/
final public class RemoteConnectorCatalogPersistence implements ConfigPersistence {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this the right interface for us to be using? I think we used this back in a time on when we had a lot less clarity on what our goals are--it was partially a trick to get platform upgrades to work. Now we have a much more focused version on "catalog persistence". Using this very broad interface means we lose a lot of clarity. As we are refactoring this, it would be a good moment to revisit what this iface should be.

cc @evantahler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgardens, I don't get if you suggest I use a different already existing interface or if we need to create a new, more focused one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the second option.

In terms of interface, I think what's being abstracted here is the source_definitions and destination_definitions.
Probably an ActorDefinitionsProvider with a listDefinitions method could work.
The current behavior could be wrapped in a SeedResource implementation while your new one would be a RemoteResource.

Injecting this into the the existing implementations would help us avoid adding a new layer of ConfigPersistence and a step forward into breaking this giant into more focused pieces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially suggested implementing a new ConfigPersistence class since that's what we were using for the YamlSeedConfigPersistence, which is doing a similar thing, and would be easy to slot in as a replacement when loading in the definitions for cloud. But I definitely agree that the typing guarantees and how generic ConfigPersistence is isn't ideal... If we want to move away from it anyway, this being an ActorDefinitionsProvider like jimmy proposed sounds great to me.

* This config persistence pulls the connector configurations from a remotely hosted catalog. It is
* read-only.
*/
final public class RemoteConnectorCatalogPersistence implements ConfigPersistence {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find the name a bit confusing, in the platform code, connectors are mostly called Source/Destination/Actor. At this point, I would try to not introduce a new one. RemoteActorDefinitions might be more in line with the current naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind... I discovered parts of the platform with "connectors". still hoping we can unify this a bit.

* This config persistence pulls the connector configurations from a remotely hosted catalog. It is
* read-only.
*/
final public class RemoteConnectorCatalogPersistence implements ConfigPersistence {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the second option.

In terms of interface, I think what's being abstracted here is the source_definitions and destination_definitions.
Probably an ActorDefinitionsProvider with a listDefinitions method could work.
The current behavior could be wrapped in a SeedResource implementation while your new one would be a RemoteResource.

Injecting this into the the existing implementations would help us avoid adding a new layer of ConfigPersistence and a step forward into breaking this giant into more focused pieces.

return configsWithMissingField;
}

private static JsonNode addMissingTombstoneField(final JsonNode definitionJson) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the ones below look like something that should be shared with the current implementation.

final JsonNode currPublic = definitionJson.get("public");
if (currPublic == null || currPublic.isNull()) {
// definitions loaded from the cloud connector catalog are by definition public
((ObjectNode) definitionJson).set("public", BooleanNode.TRUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything to change on this end, but just a note that the remote catalog will now have public set to true since the schema is enforced more strictly in https://github.com/airbytehq/airbyte-cloud/pull/2485/commits/86e9dc72a5b7afd821a9f8b3eaa10986e370817f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so if this is enforced upstream I'll remove the addition of public and custom fields.

@@ -0,0 +1,225 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason you chose to add this as part of OSS and not directly in Cloud? (mainly just curious, with the proposed changes to the interface it might make more sense to keep this in OSS)

Copy link
Contributor Author

@alafanechere alafanechere Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I think nothing logically tights it to the Cloud repo and that this could allow OSS users to declare their remote catalog.

…ence' of github.com:airbytehq/airbyte into augustin/config/implement-remote-catalog-config-persistence
@alafanechere alafanechere temporarily deployed to more-secrets August 30, 2022 13:23 Inactive
@alafanechere alafanechere marked this pull request as draft August 30, 2022 15:52
@alafanechere alafanechere temporarily deployed to more-secrets August 30, 2022 15:56 Inactive
@alafanechere
Copy link
Contributor Author

@gosusnp / @cgardens I made an (incomplete) attempt at refactoring the code in the direction I think you suggest. This is my first Java contribution, so bare with me, please 😄
Following this commit could you please:

  • Let me know if this is going in the direction you suggest
  • Suggest an approach to track all the implications of this refactor if I propagate the same changes to YamlSeedConfigPersistence. In other words: I guess that if I make an interface change, this has downstream implications everywhere the classes implementing it are used. Do you have a convenient approach to suggest to spot these locations?

@cgardens
Copy link
Contributor

cgardens commented Aug 30, 2022

@alafanechere , nice! I agree with the direction. I would say first, let's start with an interface instead of an abstract class. The most important thing to get right here is to define the interface between your team and the workflows team. I'm not convinced yet that what is is in the abstract class is actually that interface yet. After all that, we can figure out how to slot it in.

Specifically for the iface:

  public <T> T getDefinition(final ConfigSchema definitionType, final String definitionId, final Class<T> clazz)

I don't think this is a good method to have in the interface. This clazz pattern is an artifact from an implementation detail in the old ConfigPersistence. It isn't the clearest way for your team to communicate with the platform workflows team. The connectors team should also not need to know about the ConfigSchema concept (that's a lot of baggage from the platform team).

Maybe just? Not sure if there are other public methods you need in the interface?

  public StandardSourceDefinition getSourceDefinition(final UUID definitionId);
  public StandardDestinationDefinition getDestinationDefinition(final UUID definitionId);

If it is important that the single method be able to return both source and destination definitions, we can also just build a wrapper class to handle that.

    public ActorDefinitionWrapper getDefinition(final UUID definitionId);

    public ActorDefinitionWrapper(final ActorDefinitionType defType, final StandardSourceDefinition sourceDef, final StandardSourceDefinition destDef) {
      this.defType = defType;
      this.sourceDef = sourceDef;
      this.destDef = destDef;
    }

    public ActorDefinitionType getDefType() {
      return defType;
    }

    public StandardSourceDefinition getSourceDef() {
      Preconditions.checkState(getDefType() == ActorDefinitionType.SOURCE);
      return sourceDef;
    }

    public StandardSourceDefinition getDestDef() {
      Preconditions.checkState(getDefType() == ActorDefinitionType.DESTINATION);
      return destDef;
    }
  }

@gosusnp
Copy link
Contributor

gosusnp commented Aug 31, 2022

@alafanechere , pretty much agree with @cgardens's comments.
interface would be an easier way to start as it focuses on the interface (feels very rendundant here but so be it :D), we (workflow team) can refactor later into an abstraction when we get to it.

About YamlSeedConfigPersistence, in order to avoid growing the footprint of this PR too much, an option here would be to not refrain from changing the ConfigRepository interface.
In practice, it means extracting code from the current YamlSeedConfigPersistence into an implementation of the ConnectorDefinitionsProvider interface, then inject it back into the YamlSeedConfigPersistence. Most changes should remain in the constructor/initialization parts.
This way, the code is being extracted into a better scoped object without impacting downstream dependencies.

In terms of wrapping up the refactoring, it's something that we (workflow team) would typically take on. You're more than welcome to contribute if you want to take a stab at it.
To complete the refactoring, the next steps would be to then start having clients use the new ConnectorDefinitionsProvider interface directly because at this point, the current YamlSeedConfigPersistence should mostly be a wrapper around some implementation. This is where we'd update downstream dependencies.

@alafanechere
Copy link
Contributor Author

@gosusnp I made the suggested changes following our call:

  1. Create LocalDefinitionProvider that can eventually replace YamlSeedConfigPersistence
  2. Move helpers to a dedicated class

I'll write tests for LocalDefinitionProvider tomorrow. And will try to discard the usage of YamlSeedConfigPersistence to use LocalDefinitionProvider through the DefinitionProviderToConfigPersistenceAdapter.

Copy link
Contributor

@gosusnp gosusnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I added a few comments.

One thing to keep in mind, if you want to remove YamlSeedConfigPersistence. It is used in cloud, so if you remove it, you'll break cloud. A two step process here might be easier. You can replace the usage of YamlSeedConfigPersistence in this PR but you should update cloud before being able to remove it.

@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 07:25 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 07:28 Inactive
…ence' of github.com:airbytehq/airbyte into augustin/config/implement-remote-catalog-config-persistence
@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 08:09 Inactive
@alafanechere
Copy link
Contributor Author

alafanechere commented Sep 1, 2022

YamlSeedConfigPersistence. It is used in cloud, so if you remove it, you'll break cloud.

I'll definitely do the following before removing YamlSeedConfigPersistence :

  1. merge this PR
  2. publish the changes to cloud
  3. make the changes to BootloaderAppWrapped on cloud, YamlSeedConfigPersistence should not be used anymore on cloud after this step.
  4. Change usage of YamlSeedConfigPersistence to RemoteDefinitionsProvider through the adapter on OSS. YamlSeedConfigPersistence should not be used anymore on OSS after this step.
  5. Remove YamlSeedConfigPersistence
  6. Open an issue to refactor BootloaderApp + ConfigRepositoryclasses to make them use the DefinitionsProvider interface.

@alafanechere alafanechere requested a review from gosusnp September 1, 2022 08:26
@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 08:28 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 15:31 Inactive
@alafanechere alafanechere temporarily deployed to more-secrets September 1, 2022 18:11 Inactive
@alafanechere alafanechere merged commit be56011 into master Sep 1, 2022
@alafanechere alafanechere deleted the augustin/config/implement-remote-catalog-config-persistence branch September 1, 2022 19:33
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants