Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong password in generated connection-secret for PostgreSQLServer #230

Closed
haraldatbmw opened this issue Mar 12, 2021 · 8 comments · Fixed by #284
Closed

Wrong password in generated connection-secret for PostgreSQLServer #230

haraldatbmw opened this issue Mar 12, 2021 · 8 comments · Fixed by #284
Assignees
Labels
bug Something isn't working

Comments

@haraldatbmw
Copy link

What happened?

During the creation of the Azure resource PostgreSQLServer the password inside the generated connection-secret is changed multiple times. And sometimes (not deterministic) the final secret contains a wrong password. It seams to be some kind of timing issue.

I observed the secret changes. They looked like this:

  1. secret was created and contains only password data-field
  2. 2 seconds after creation the password has changed
  3. 1 minute after creation the username, endpoint and port gets generated

If I manually delete the secret it gets recreated by crossplane but the password data-field is missing.

How can we reproduce it?

Create a Azure resource PostgreSQLServer multiple times and verify if connection with the secret can be established.

What environment did it happen in?

Crossplane version: 1.1.0

@haraldatbmw haraldatbmw added the bug Something isn't working label Mar 12, 2021
@muvaf
Copy link
Member

muvaf commented Mar 24, 2021

Wonder if that's related to this very old PR of mine #88

The root problem is that Azure returns NotFound during the creation, so we have to check if the resource is being created by other means so that we don't call Create more than once.

@muvaf muvaf mentioned this issue Mar 24, 2021
2 tasks
@muvaf
Copy link
Member

muvaf commented Mar 24, 2021

@haraldatbmw I'm not able to reproduce this using examples/database/postgresqlserver.yaml Could you post the YAML you used?

@haraldatbmw
Copy link
Author

@muvaf My YAMLs can be found here: crossplane/crossplane#2192

@muvaf
Copy link
Member

muvaf commented Mar 24, 2021

Alright, I think this is about the composition you have rather than PostgreSQLServer, similar to this one crossplane-contrib/provider-aws#378

I think what happens is that it patches the whole metadata.annotations instead of only metadata.annotations[crossplane.io/external-name], hence existing value of crossplane.io/external-name annotation gets deleted and when controller sees it empty, it tries to create a new one. Could you change the fromFieldPath: metadata.annotations to fromFieldPath: metadata.annotations[crossplane.io/external-name] and see if you still experience the issue?

@haraldatbmw
Copy link
Author

Just retried with your hint. I changed the patch inside the composition like this:

  patchSets:
  - name: Metadata
    patches:
    - fromFieldPath: metadata.labels
    - fromFieldPath: metadata.annotations[crossplane.io/external-name] 

I applied the claim and watched the secret postgresqlserver-admin-059ca720-c2b3-470c-af82-832a890f2564 while creating the postgresql-server. I counted 10 times password-change during creation. After server-creation the sql-provider had the same issue cannot select database: pq: password authentication failed for user "myadmin"

@fsniper
Copy link

fsniper commented May 11, 2021

I do have the same issue. The secret changes at least 3 times. On my experiments, sometimes the first password, sometimes the last one worked.

@fsniper
Copy link

fsniper commented May 11, 2021

I also tried removing the metadata patches to no avail. Still same issue.

@ulucinar ulucinar self-assigned this Aug 18, 2021
ulucinar added a commit to ulucinar/provider-azure that referenced this issue Aug 18, 2021
- Fixes crossplane-contrib#230

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
@ulucinar
Copy link
Collaborator

Yesterday, I tried to reproduce this issue using the example manifest in the provider-azure repo but in a couple of retries I was not able to. However, when I tried to reproduce it using the claim and composite & composition definitions supplied by @haraldatbmw here I was able to reproduce it. I stripped down the composition so that I could only work with core crossplane & provider-azure controllers.

In a run where I observed this issue, I was able to collect the following provider-azure logs:

...
2021-08-18T07:22:10.753+0300    DEBUG   provider-azure  Cannot create external resource {"controller": "managed/postgresqlserver.database.azure.crossplane.io", "request": "/example-2v2lb-rmlkw", "uid": "610affa2-1010-49ff-baa9-f3e65585a3be", "version": "1330893", "external-name": "example-2v2lb-rmlkw", "error": "cannot create PostgreSQLServer: postgresql.ServersClient#Create: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code=\"ServiceBusy\" Message=\"Service is temporarily busy and the operation cannot be performed. Please try again later.\"", "errorVerbose": "postgresql.ServersClient#Create: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code=\"ServiceBusy\" Message=\"Service is temporarily busy and the operation cannot be performed. Please try again later.\"\ncannot create PostgreSQLServer\ngithub.com/crossplane/provider-azure/pkg/controller/database/postgresqlserver.
...

And this is reflected in the corresponding PostgreSQLServer MR as follows:

status:
  atProvider:
    lastOperation:
      errorMessage: Code="ServiceBusy" Message="Service is temporarily busy and the
        operation cannot be performed. Please try again later."
      method: PUT
      pollingUrl: https://management.azure.com/subscriptions/...
      status: Failed

This breaks our assumption here because a create call has already been issued (and the corresponding future has been stored) but because of a temporary error it's not in in-progress state. Furthermore, as indicated here the provisioning state values returned are resource specific. In this specific case, provisioning successfully completed but we had a wrong (re-)generated password stored in the connection secret.

A fix targeted at provisioning status checks referenced above also failed to address the issue because:

  • Sometimes, just after an external.Create call where password field of the connection secret is populated, because of conflicts we may fail to store the Azure async operation future in the PostgreSQLServer object's status.
  • Controller-runtime client's cache may contain a stale version of thePostgreSQLServer object's status, erroneously leading the managed reconciler to make another external.Create call.

Because of these reasons, I decided to take a different approach that's detailed in #284. After a run of 10 tries using a script developed for reproducing/testing this issue, no inconsistent passwords were observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants