Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kakfa: Cluster resource is not able to get to the Ready state #489

Closed
blakebarnett opened this issue Feb 9, 2023 · 6 comments · Fixed by #877
Closed

kakfa: Cluster resource is not able to get to the Ready state #489

blakebarnett opened this issue Feb 9, 2023 · 6 comments · Fixed by #877

Comments

@blakebarnett
Copy link

blakebarnett commented Feb 9, 2023

What happened?

I've been seeing problems with the cluster.kafka.aws.upbound.io, it seems that when setting spec.forProvider.clusterName which is required, this becomes the name of the resource in AWS, and then crossplane thinks it no longer owns it. If we set the crossplane.io/external-name annotation to match this, then it's unable to create the cluster, when we use external-name we get this error:

Message:               observe failed: cannot run refresh: refresh failed: reading MSK Cluster (services-kafka-cluster): AccessDeniedException:
                           status code: 403, request id: 4f4bf7dc-a24e-450b-8c42-4d516be05743:
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced

When we don't use external-name we get this error:

    Message:               apply failed: creating MSK Cluster (services-kafka-cluster): ConflictException: A resource with this name already exists.
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "dfce3934-d9ea-4ec8-885b-7d56b91e95a3"
  },
  InvalidParameter: "clusterName",
  Message_: "A resource with this name already exists."
}

Which makes no sense because crossplane has full admin in this account. I'm not sure how this can to ever get to the Ready state.

How can we reproduce it?

Create a Kafka cluster with the resource.

What environment did it happen in?

  • Universal Crossplane Version: 1.10.1
  • Provider Version: v0.26.0
  • Cloud provider or hardware configuration: AWS
  • Kubernetes version (use kubectl version): v1.24.8-eks-ffeb93d
  • Kubernetes distribution (e.g. Tectonic, GKE, OpenShift): EKS
@blakebarnett blakebarnett added the bug Something isn't working label Feb 9, 2023
@blakebarnett
Copy link
Author

Update on this for provider-aws 0.29 and crossplane 1.11, we see this error now, preventing the cluster from becoming ready:

cannot update managed resource: Cluster.kafka.aws.upbound.io "kafka-cluster-58kmx-g4tw9" is invalid: [spec.forProvider.configurationInfo[0].arn: Required value, spec.forProvider.configurationInfo[0].revision: Required value]

We don't specify a Configuration resource, as it's optional...

@blakebarnett
Copy link
Author

Update on this for provider-aws 0.29 and crossplane 1.11, we see this error now, preventing the cluster from becoming ready:

cannot update managed resource: Cluster.kafka.aws.upbound.io "kafka-cluster-58kmx-g4tw9" is invalid: [spec.forProvider.configurationInfo[0].arn: Required value, spec.forProvider.configurationInfo[0].revision: Required value]

We don't specify a Configuration resource, as it's optional...

After adding a Configuration resource, we're back to the 409 and 403 scenario originally reported.

@svscheg
Copy link
Contributor

svscheg commented Mar 27, 2023

Hi @blakebarnett ! Thank you for reporting the issue.
For now I can't reproduce this issue (provider-aws 0.31)

Using only clusterName:
Image
cluster_no_anotation.yaml.txt

Using crossplane.io/external-name and clusterName
Image
cluster_with_anotation.yaml.txt

Could you please try again with provider-aws 0.31, and If you still can reproduce this issue, please attach example with your resource.

@blakebarnett
Copy link
Author

blakebarnett commented Mar 29, 2023

This may be fixed by newer versions of the underlying terraform provider? I'm still using provider-aws v0.29, I'll upgrade to v0.32rc and test.

This issue might only be triggered when a modification is made to the resource after it's created, especially if it fails to be created the first time. I saw it happening while developing a composition for it, so it wasn't correctly configured at first and then it got into this state.

@svscheg
Copy link
Contributor

svscheg commented Mar 30, 2023

Hi @blakebarnett , so steps for reproduce should be next:

  1. Create Kafka cluster
  2. Check that it is created successfully
  3. Then make some updates in this resource and apply the changes

Result: will have an error 409 or 403

And one more think - does it matter what changes will do in step 3 or I can change/add any attribute?

@blakebarnett
Copy link
Author

blakebarnett commented Mar 30, 2023

I think for example I had an incorrect configuration for spec.forProvider.brokerNodeGroupInfo[0].storageInfo[0].ebsStorageInfo[0].volumeSize (didn't realize it needed arrays all the way down) and the cluster was not created successfully. Once I corrected the configuration I was hitting this error.

BTW I think these might be related issues from the contrib provider:
crossplane-contrib/provider-aws#1551
crossplane-contrib/provider-aws#1620

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants