Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add monitor for event-based reconciliation #2048

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

max-melentyev
Copy link
Contributor

This is a preview-PR, it's not supposed to be merged now.

It's to show a feature that requires crossplane/crossplane#5557 and #2030

Related to #2029

Description of your changes

This PR introduces a monitor of EventBridge events that are published to SQS queue. This allows to keep provider-aws responsive to external changes even when it manages large number of resources that takes long time to reconcile using polling approach. With event based reconciliation polling interval can be increased but resources will still be reconciled when they have updates.

This PR only enables support for ec2.Instance and route53.ResourceRecordSet events. Event based reconciliation is optional and works along with polling.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

  • new tests
  • staging
  • production

Signed-off-by: Max Melentyev <max.melentyev@reddit.com>
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This pull request has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Adding a comment starting with /fresh will mark this PR as not stale.

@github-actions github-actions bot added the stale label Jul 23, 2024
@max-melentyev
Copy link
Contributor Author

/fresh

@max-melentyev
Copy link
Contributor Author

/fresh

@github-actions github-actions bot removed the stale label Jul 24, 2024
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This pull request has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Adding a comment starting with /fresh will mark this PR as not stale.

@github-actions github-actions bot added the stale label Oct 22, 2024
@max-melentyev
Copy link
Contributor Author

/fresh

@max-melentyev
Copy link
Contributor Author

/fresh

@github-actions github-actions bot removed the stale label Oct 23, 2024
@MisterMX
Copy link
Collaborator

While I understand the issue with AWS request throttling, especially regarding Route53, I am very reluctant on merging this as it feels like a very specialized implementation tailored to a very specific use case.

With this we would make the provider depend on external infrastructure, such as SQS queues to do its job. Given that a provider can potentially target multiple AWS accounts this solution might not work in every scenario.

Additionally, the connection to the queue relies on the IRSA credentials injected into the pod. However, there is no guarantee that this will work since a provider / controller receives its credentials through the provider configs referenced in each individual managed resource which also is not necessarily IRSA but can be a file source as well.

Overall I don't we are gonna merge this. Again, I do think that throttling is a relevant issue but I think it is better to try to solve this with K8s tooling or at least something that is kinda agnostic to the underlying technology (i.e. SQS).

@max-melentyev
Copy link
Contributor Author

Hey @MisterMX, let me address your concerns. But first I'd add that we have provider-aws managing 10k instances and 13k resourcerecordsets in multiple AWS accounts for more than 6 months. Before switching to event-based reconciliation we started seeing issues with 2k resources, queue waiting times were more than 10m.

With this we would make the provider depend on external infrastructure, such as SQS queues to do its job.

Event-based reconciliation is optional. It's enabled only If SQS queue address is provided.

Given that a provider can potentially target multiple AWS accounts this solution might not work in every scenario.

It's possible to have a multi-account setup: managed accounts stream stream EventBridge events to an account where provider-aws runs, and SQS is configured only in one account. I can add more details to readme or terraform examples/modules.

Additionally, the connection to the queue relies on the IRSA credentials injected into the pod. However, there is no guarantee that this will work since a provider / controller receives its credentials through the provider configs referenced in each individual managed resource which also is not necessarily IRSA but can be a file source as well.

Provider config is not used to access SQS. As there's a single SQS, IAM policies can be configured to allow access from provider-aws pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants