Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the crossplane.io/external-create-... annotations #688

Merged
merged 11 commits into from
Feb 1, 2024
148 changes: 136 additions & 12 deletions content/master/concepts/managed-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ kind: RDSInstance
metadata:
name: my-rds-instance
annotations:
crossplane.io/external-name: my-custom-namee
crossplane.io/external-name: my-custom-name
```

```shell {copy-lines="1"}
Expand All @@ -635,21 +635,145 @@ my-rds-instance True True my-custom-name 11m

### Creation annotations

Providers create new managed resources with the
`crossplane.io/external-create-pending` annotation.
In some rare situations a provider can forget that it created a resource. When
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the use of "rare" (subjective) or "forget" (anthropomorphic) .

Rare sounds non-deterministic. It may require a specific sequence but the sequence is known and should be described. It may be rare in general but when you are hitting all the time hearing it's "rare" is very frustrating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. Reworded to avoid "rare" and "forgot".

That said, I would prefer if folks did not read this section and think "Crossplane's going to leak my resources all the time, I shouldn't trust it".

this happens the provider can't manage the resource.

The Provider applies the `crossplane.io/external-create-succeeded` or
`crossplane.io/external-create-failed` annotation after making the external API
call and receiving a response.
{{<hint "tip">}}
Crossplane calls resources that a provider creates but doesn't manage _leaked
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Crossplane calls resources that a provider creates but doesn't manage _leaked
Crossplane calls resources that a provider created but lost track of and therefore doesn't manage _leaked

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to word in such a way so that readers would understand instantly that this is an error case, without using the word "error", "exception" or "rare"

resources_.
{{</hint>}}

{{<hint "note" >}}
If a Provider restarts before creating the `succeed` or `fail` annotations the
Provider can't reconcile the managed resource.
Providers set three creation annotations to avoid and detect leaked resources:

Read Crossplane [issue #3037](https://github.com/crossplane/crossplane/issues/3037#issuecomment-1110142427)
for more details
{{< /hint >}}
* {{<hover label="creation" line="8">}}crossplane.io/external-create-pending{{</hover>}} -
The last time the provider was about to create the resource.
* {{<hover label="creation" line="9">}}crossplane.io/external-create-succeeded{{</hover>}} -
The last time the provider successfully created the resource.
* `crossplane.io/external-create-failed` - The last time the provider failed to
create the resource.


Use `kubectl get` to view the annotations on a managed resource. For example, an
AWS VPC resource:

```yaml {label="creation" copy-lines="2-9"}
$ kubectl get vpc my-vpc
negz marked this conversation as resolved.
Show resolved Hide resolved
apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
name: my-vpc
annotations:
crossplane.io/external-name: vpc-1234567890abcdef0
crossplane.io/external-create-pending: "2023-12-18T21:48:06Z"
crossplane.io/external-create-succeeded: "2023-12-18T21:48:40Z"
```

A provider uses the
{{<hover label="creation" line="7">}}crossplane.io/external-name{{</hover>}}
annotation to find a resource in an external system, for example AWS.
negz marked this conversation as resolved.
Show resolved Hide resolved

If the provider can't find a managed resource in an external system, it thinks
the resource doesn't exist. When the provider thinks a resource doesn't exist
it creates the resource.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the provider can't find a managed resource in an external system, it thinks
the resource doesn't exist. When the provider thinks a resource doesn't exist
it creates the resource.
If the provider finds a managed resource with a
{{<hover label="creation" line="7">}}crossplane.io/external-name{{</hover>}}
annotation and doesn't find an external resource with a matching name, the
provider creates a new external resource.

Feel free to rework, but I think this should be "dumbed down" a bit and be more explicit.

  • What is the provider looking for in the external system?
  • How does that get connected to an MR?
  • Avoid "think"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked, PTAL.

I prefer not to mention external-name again explicitly here. It feels repetitive - how the provider looks up the MR - which is what the sentence immediately prior says.

"If the provider finds an MR with..." reads strangely to me too - MRs always have this annotation.


Some external systems don't let a provider specify a resource's name when the
provider creates it. Instead the external system generates an unpredictable name
negz marked this conversation as resolved.
Show resolved Hide resolved
and returns it to the provider.

When the external system generates the resource's name, it's critical that the
provider saves it to the managed resource's `crossplane.io/external-name`
annotation. If it doesn't, it leaks the resource.
negz marked this conversation as resolved.
Show resolved Hide resolved

A provider can't guarantee that it can save the annotation. The provider could
restart or lose network connectivity between creating the resource and saving
the annotation.

{{<hint "important">}}
Anytime an external system generates a resource's name there is a risk the
provider could leak the resource.
{{</hint>}}

A provider can detect that it might have leaked a resource. If the provider
thinks it might have leaked a resource, it stops reconciling it until you tell
the provider it's safe to proceed.

When a provider thinks it might have leaked a resource it creates a `cannot
determine creation result` event associated with the managed resource. Use
`kubectl describe` to see the event.

```shell {copy-lines="1"}
kubectl describe queue my-sqs-queue

# Removed for brevity

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning CannotInitializeManagedResource 29m (x19 over 19h) managed/queue.sqs.aws.crossplane.io cannot determine creation result - remove the crossplane.io/external-create-pending annotation if it is safe to proceed
```

{{<hint "important">}}
The safest thing for a provider to do when it detects that it might have leaked
a resource is to stop and wait for human intervention.

This ensures the provider doesn't create duplicates of the leaked resource.
Duplicate resources can be costly and dangerous.
{{</hint>}}
negz marked this conversation as resolved.
Show resolved Hide resolved

Providers use the creation annotations to detect that they might have leaked a
resource.

Each time a provider reconciles a managed resource it checks the resource's
creation annotations. If the provider sees a create pending time that's more
recent than the most recent create succeeded or create failed time, it knows
that it might have leaked a resource.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other common cases where this would happen than when someone went to the AWS UI and removed a reconciled bucket manually, and then Crossplane tried to recreate it, but failed to save the new name in the annotations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing comes to mind.


{{<hint "note">}}
Providers don't remove the creation annotations. They use the timestamps to
determine which is most recent. It's normal for a managed resource to have
several creation annotations.
{{</hint>}}

The provider knows it might have leaked a resource because it updates all the
resource's annotations at the same time. If the provider couldn't update the
creation annotations after it created the resource, it also couldn't update the
`crossplane.io/external-name` annotation.

{{<hint "tip">}}
Inspect the external system when resources have a `cannot determine creation
result` error.
negz marked this conversation as resolved.
Show resolved Hide resolved

Use the timestamp from the `crossplane.io/external-create-pending` annotation to
determine when the provider might have leaked a resource. Look for resources
created around this time.

If you find a leaked resource, and it's safe to do so, delete it from the
external system.

Remove the `crossplane.io/external-create-pending` annotation from the managed
resource after you're sure no leaked resource exists. This tells the provider to
resume reconciliation of the managed resource.
negz marked this conversation as resolved.
Show resolved Hide resolved
{{</hint>}}

Providers also use the creation annotations to avoid leaking resources.

When a provider writes the `crossplane.io/external-create-pending` annotation it
knows it's reconciling the latest version of the managed resource. The write
would fail if the provider was reconciling an old version of the managed
resource.

If the provider reconciled an old version with an outdated
`crossplane.io/external-name` annotation it could mistakenly determine that the
resource didn't exist. The provider would create a new resource, and leak the
existing one.

Some external systems have a delay between when a provider creates a resource
and when the system reports that it exists. The provider uses the most recent
create succeeded time to account for this delay.

If the provider didn't account for the delay, it could mistakenly determine
that the resource didn't exist. The provider would create a new resource, and
leak the existing one.

### Paused
Manually applying the `crossplane.io/paused` annotation causes the Provider to
Expand Down
Loading