Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logs and events to reconcilers #108

Merged
merged 1 commit into from
Jan 27, 2020
Merged

Add logs and events to reconcilers #108

merged 1 commit into from
Jan 27, 2020

Conversation

negz
Copy link
Member

@negz negz commented Jan 15, 2020

Description of your changes

This commit introduces new logging and eventing libraries, and adds debug logs and events to all reconcilers. It's guided by the below thinking:

Specifically the approach is to:

  • Have each reconciler scaffold handle the bulk (all?) of the logging and eventing.
  • Avoid plumbing loggers and event recorders down into the pluggable logic the reconciler scaffolds call in order to reconcile stuff (e.g. to CRUD external resources, propagate secrets, etc). Instead have said logic wrap error messages with useful context, and log them from the scaffold.
  • Log assuming Crossplane will be running in production. i.e.
    • 50+ claims and resources under reconciliation.
    • Most users don't have access to Crossplane pod logs.
    • Logs are being sent to a log aggregation platform (Stackdriver, ELK, etc)
    • Timeseries metrics are exposed where appropriate.

It's worth noting that the final point above (assuming production) has thus far meant I haven't yet come across anything that seems worthy of logging at info level, however ~80% of the situations worthy of a debug log (mostly errors or healthy state changes relating to resources) have corresponding Kubernetes events.

Checklist

I have:

  • Run make reviewable to ensure this PR is ready for review.
  • Ensured this PR contains a neat, self documenting set of commits.
  • Updated any relevant documentation, examples, or release notes.
  • Updated the RBAC permissions in clusterrole.yaml to include any new types.

@upbound-bot
Copy link
Collaborator

90% (-0.27%) vs master 90%

@upbound-bot
Copy link
Collaborator

90% (-0.11%) vs master 90%

@upbound-bot
Copy link
Collaborator

90% (-0.11%) vs master 90%

@upbound-bot
Copy link
Collaborator

90% (-0.11%) vs master 90%

Copy link
Member

@suskin suskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesomeeee 🦑:shipit: . I had some questions, but they're probably not blockers. But it'd be good to at least take a look at them before merging.

I haven't yet come across anything that seems worthy of logging at info level.

I agree with this.

pkg/logging/logging.go Show resolved Hide resolved
pkg/logging/logging.go Show resolved Hide resolved
pkg/reconciler/claimbinding/reconciler.go Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Show resolved Hide resolved
pkg/util/pod.go Show resolved Hide resolved
@negz negz changed the title Log all the things! Add logs and events to reconcilers Jan 24, 2020
@upbound-bot
Copy link
Collaborator

88% (-2.11%) vs master 90%

@upbound-bot
Copy link
Collaborator

88% (-2.11%) vs master 90%

This commit introduces new logging and eventing libraries, and adds debug logs
and events to all reconcilers.

Signed-off-by: Nic Cope <negz@rk0n.org>
@negz negz marked this pull request as ready for review January 27, 2020 03:27
@upbound-bot
Copy link
Collaborator

86% (-4.12%) vs master 90%

@negz
Copy link
Member Author

negz commented Jan 27, 2020

Here are some example events for a newly created resource claim:

# Per kubectl describe postgresqlinstance
Events:
  Type    Reason                  Age                From                                Message
  ----    ------                  ----               ----                                -------
  Normal  SelectedResourceClass   51m                claimscheduling/postgresqlinstance  Selected matching resource class
  Normal  CreatedManagedResource  51m                claimbinding/postgresqlinstance     Successfully created managed resource
  Normal  WaitingToBind           51m (x8 over 51m)  claimbinding/postgresqlinstance     Managed resource is not yet bindable
  Normal  BoundManagedResource    28m                claimbinding/postgresqlinstance     Successfully bound managed resource

...and the corresponding CloudSQLInstance:

# Per kubectl describe cloudsqlinstance
Events:
  Type     Reason                   Age                From                      Message
  ----     ------                   ----               ----                      -------
  Normal   CreatedExternalResource  52m                managed/cloudsqlinstance  Successfully requested creation of external resource
  Normal   UpdatedExternalResource  1s                 managed/cloudsqlinstance  Successfully requested update of external resource
  Warning  CannotConnectToProvider  15s (x3 over 15s)  managed/cloudsqlinstance  provider could not be retrieved: Provider.gcp.crossplane.io "example" not found

It's also possible to view events for all resources in a particular namespace:

# Per kubectl get events --sort-by=.metadata.creationTimestamp
57m         Normal   SelectedResourceClass     postgresqlinstance/app-postgresql               Selected matching resource class
55m         Normal   CreatedManagedResource    postgresqlinstance/app-postgresql               Successfully created managed resource
55m         Normal   WaitingToBind             postgresqlinstance/app-postgresql               Managed resource is not yet bindable
55m         Normal   CreatedExternalResource   cloudsqlinstance/default-app-postgresql-28rwf   Successfully requested creation of external resource
33m         Normal   BoundManagedResource      postgresqlinstance/app-postgresql               Successfully bound managed resource
2m52s       Normal   UpdatedExternalResource   cloudsqlinstance/default-app-postgresql-28rwf   Successfully requested update of external resource
65s         Normal   UnboundManagedResource    postgresqlinstance/app-postgresql               Successfully unbound managed resource
63s         Normal   DeletedExternalResource   cloudsqlinstance/default-app-postgresql-28rwf   Successfully requested deletion of external resource

@negz
Copy link
Member Author

negz commented Jan 27, 2020

Here are some debug logs of the happy path dynamic provisioning of a resource claim:

2020-01-26T19:41:48.654-0800    DEBUG   stack-gcp       Starting        {"sync-period": "1h0m0s"}                                                             
2020-01-26T19:41:59.183-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimscheduling/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:41:59.695-0800    DEBUG   stack-gcp       Attempting to set resource class        {"controller": "claimscheduling/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7101", "external-name": "", "class-kind": "CloudSQLInstanceClass", "class-name": "cloudsqlinstancepostgresql-standard"}
2020-01-26T19:41:59.708-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}  
2020-01-26T19:41:59.818-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:41:59.827-0800    DEBUG   stack-gcp       Successfully created managed resource   {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7102", "external-name": "", "class-name": "cloudsqlinstancepostgresql-standard", "managed-name": ""}
2020-01-26T19:41:59.827-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7102", "external-name": "", "class-name": "cloudsqlinstancepostgresql-standard", "managed-name": ""}
2020-01-26T19:41:59.835-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:41:59.835-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}                   
2020-01-26T19:41:59.931-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:41:59.931-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}                   
2020-01-26T19:42:00.389-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:42:00.389-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:42:02.772-0800    DEBUG   stack-gcp       Successfully requested creation of external resource    {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7104", "external-name": "", "requeue-after": "2020-01-26T19:42:32.772-0800"}
2020-01-26T19:42:02.780-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:42:02.780-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:42:02.780-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:42:02.961-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:42:02.961-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:42:02.972-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7123", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:43:02.972-0800"}
2020-01-26T19:42:02.979-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:42:02.980-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:42:02.980-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:42:03.098-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7125", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:43:03.098-0800"}
2020-01-26T19:42:04.108-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:42:04.244-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7128", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:43:04.244-0800"}
2020-01-26T19:43:04.254-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:43:04.385-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7128", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:44:04.385-0800"}
2020-01-26T19:43:04.393-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:43:04.394-0800    DEBUG   stack-gcp       Managed resource is not yet bindable    {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:43:04.397-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:43:04.523-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7213", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:44:04.523-0800"}
2020-01-26T19:44:04.397-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:44:04.524-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7213", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:45:04.524-0800"}
2020-01-26T19:45:04.536-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:45:04.668-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7213", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:46:04.668-0800"}
2020-01-26T19:46:04.680-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:46:04.812-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7213", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:47:04.812-0800"}
2020-01-26T19:46:04.826-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:46:04.831-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:46:04.864-0800    DEBUG   stack-gcp       Successfully bound managed resource     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql", "uid": "a0f5328a-715f-4c12-8926-e414f2e9e35f", "version": "7106", "external-name": "", "managed-name": "default-app-postgresql-4wsmn"}
2020-01-26T19:46:04.872-0800    DEBUG   stack-gcp       Reconciling     {"controller": "claimbinding/postgresqlinstance", "request": "default/app-postgresql"}
2020-01-26T19:46:04.950-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7458", "external-name": "default-app-postgresql-4wsmn", "requeue-after": "2020-01-26T19:47:04.950-0800"}
2020-01-26T19:46:05.958-0800    DEBUG   stack-gcp       Reconciling     {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn"}
2020-01-26T19:46:06.086-0800    DEBUG   stack-gcp       External resource is up to date {"controller": "managed/cloudsqlinstance", "request": "/default-app-postgresql-4wsmn", "uid": "61cd6496-753a-4397-940c-192f20a0ade8", "version": "7462", "external-name": "default-apppostgresql-4wsmn", "requeue-after": "2020-01-26T19:47:06.086-0800"}

Copy link
Member

@hasheddan hasheddan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really solid to me @negz! Thanks for the examples to demonstrate. Are there any further steps you anticipate needing to implement?


// ControllerName returns the recommended name for controllers that use this
// package to reconcile a particular kind of resource claim.
func ControllerName(kind string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this addition here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there were a bunch of inconsistencies in controller names, and that they were fairly long and verbose in the logs. I figured we could get away with just the kind as there will rarely be more than one controller of a particular flavor for a particular resource kind within a particular binary.

// Both secrets must consent to this process by including propagation
// annotations. The Reconciler assumes it has a watch on both propagating (from)
// and propagated (to) secrets.
type Reconciler struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a nice help with the making the Secret reconciler implementation look more like the others

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was essentially required now that we want functional arguments.

@negz
Copy link
Member Author

negz commented Jan 27, 2020

Are there any further steps you anticipate needing to implement?

Good question. There's a lot more we could do, namely:

  1. Update all infrastructure stacks to pass in loggers and event records to reconcilers.
  2. Document our required logging practices.
  3. Bring the workload and stack manager loggers into line with this pattern.
  4. Plumb logging and eventing into the various managed.ExternalClient implementations.

I plan to address 1 and 2 before considering crossplane/crossplane#331 and crossplane/crossplane#944 fixed, and will raise a new issue to track 3. I'm not yet sure whether 4 is really necessary/useful.

negz added a commit to negz/provider-gcp that referenced this pull request Jan 27, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-gcp controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
@negz negz merged commit 8661974 into crossplane:master Jan 27, 2020
@negz negz deleted the observe branch January 27, 2020 05:29
negz added a commit to negz/provider-gcp that referenced this pull request Jan 27, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-gcp controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-gcp that referenced this pull request Jan 27, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-gcp controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this pull request Jan 30, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-aws controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-azure that referenced this pull request Jan 31, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-aws controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-azure that referenced this pull request Feb 1, 2020
crossplane/crossplane-runtime#108

This commit updates all stack-aws controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
wolffbe pushed a commit to wolffbe/provider-aws that referenced this pull request Feb 12, 2021
crossplane/crossplane-runtime#108

This commit updates all stack-aws controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.

Signed-off-by: Nic Cope <negz@rk0n.org>
namku pushed a commit to namku/provider-aws that referenced this pull request Mar 9, 2021
crossplane/crossplane-runtime#108

This commit updates all stack-aws controllers to use the logging and event
recording functionality added in the above crossplane-runtime PR. Legacy
controllers are minimally updated to avoid package scoped loggers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants