Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix loading remote definitions via cron #21410

Merged
merged 9 commits into from
Jan 14, 2023

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented Jan 13, 2023

What

fixes #21401

After #21073 moved the definitions provider to be injected as a singleton, the behavior of only fetching definitions in the RemoteDefinitionProvider on class initialization meant that we were no longer fetching the latest definitions each time the cron ran.

How

Refactor the RemoteDefinitionsProvider to let use leverage micronaut caching to determine when to re-fetch the data. Each getter now calls the method to fetch the data, but we're caching the results of the method for 15s on airbyte-cron.

@pedroslopez pedroslopez requested review from evantahler, a team and gosusnp January 13, 2023 18:31
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 18:32 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 18:33 — with GitHub Actions Inactive
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Just to check my understanding - in Cloud, there will be a PR to call definitionsProvider.loadDefinitions()?

@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 18:41 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 18:41 — with GitHub Actions Inactive
Copy link
Contributor

@gosusnp gosusnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could leverage micronaut caching here instead of requiring the client to decide when to actually load the definitions. I added comments trying to highlight what this could look like.
edit: link to micronaut doc

The other missing piece would be updating the application.yml with:

micronaut:
  caches:
    remoteDefinitions: 
      charset: 'UTF-8'
      expire-after-write: 1min

@jdpgrailsdev, do you think that would be a valid approach to manage how frequently we refresh definitions?

this.destinationDefinitions = parseDefinitions(this.seedResourceClass, SeedType.STANDARD_DESTINATION_DEFINITION.getResourcePath(),
SeedType.DESTINATION_SPEC.getResourcePath(), SeedType.STANDARD_DESTINATION_DEFINITION.getIdName(), SeedType.DESTINATION_SPEC.getIdName(),
StandardDestinationDefinition.class);
public void loadDefinitions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since local definitions are read from the resources, we could annotate here with @PostConstruct

@@ -48,22 +48,26 @@ public LocalDefinitionsProvider(final Class<?> seedResourceClass) throws IOExcep
this.seedResourceClass = seedResourceClass;

// TODO remove this call once dependency injection framework manages object creation
initialize();
loadDefinitions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this call here.

StandardDestinationDefinition::getDestinationDefinitionId,
destination -> destination.withProtocolVersion(
AirbyteProtocolVersion.getWithDefault(destination.getSpec() != null ? destination.getSpec().getProtocolVersion() : null).serialize())));
public void loadDefinitions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a @cACHEpUT("remoteDefinitions") annotation here instead of @PostConstruct

all the getters in this provider should call loadDefinitions()

@jdpgrailsdev
Copy link
Contributor

I am wondering if we could leverage micronaut caching here instead of requiring the client to decide when to actually load the definitions. I added comments trying to highlight what this could look like. edit: link to micronaut doc

The other missing piece would be updating the application.yml with:

micronaut:
  caches:
    remoteDefinitions: 
      charset: 'UTF-8'
      expire-after-write: 1min

@jdpgrailsdev, do you think that would be a valid approach to manage how frequently we refresh definitions?

@gosusnp Yes, but I think it is a little more involved. The Micronaut Cache abstraction is just that -- an abstraction over the creation and management of an in memory cache, such as one you could create using Guava, Ehcache or Caffeine. You would still need to mark the methods that will cache their return values with the @Cachable annotation. This is also not a distributed cache, so its possible that different instances of the application may return different results depending on when they were called even with caching.

@pedroslopez pedroslopez requested a review from a team as a code owner January 13, 2023 21:40
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 21:44 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 21:44 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 21:55 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 21:56 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 22:40 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 22:40 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 22:47 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 22:47 — with GitHub Actions Inactive
@pedroslopez
Copy link
Contributor Author

Ok! Moved to using micronaut to handle the caching and reverted to the previous interface. Set to cache for 15s (cron runs every 30s).

@pedroslopez pedroslopez enabled auto-merge (squash) January 13, 2023 22:48
@gosusnp
Copy link
Contributor

gosusnp commented Jan 13, 2023

@jdpgrailsdev, I agree, it is a restriction from having an in-memory cache, I don't think it is a blocker for the current use case. The goal is to refresh frequently and I don't think we want to have concurrent updates so it should remain restricted to a single instance.

@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 23:28 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 13, 2023 23:28 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 14, 2023 00:11 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 14, 2023 00:12 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

File Coverage [96.63%] 🍏
RemoteDefinitionsProvider.java 98.91% 🍏
LocalDefinitionsProvider.java 94.85% 🍏
Total Project Coverage 26.59% 🍏

@pedroslopez pedroslopez merged commit 220f193 into master Jan 14, 2023
@pedroslopez pedroslopez deleted the pedroslopez/fix-updater-cron-refresh branch January 14, 2023 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airbyte Cron really does re-fetch new connector definitions every 30s
4 participants