Skip to content

Conversation

@ZPascal
Copy link
Contributor

@ZPascal ZPascal commented Jun 30, 2025

Description

This RFC proposes to introduce support for organization- and space-scoped client certificates for Cloud Foundry Loggregator syslog drains using mutual TLS (mTLS), covering both HTTPS and syslog+TLS protocols. By issuing certificates at the org or space level instead of per application, this initiative will simplify certificate lifecycle management, enable centralized rotation, and facilitate integration with central certificate authorities. The change targets the reduction of operational overhead and the enhancement of tenant-level security.

Involved Working Groups:

@cloudfoundry/toc
@cloudfoundry/wg-app-runtime-platform-logging-and-metrics-approvers
@cloudfoundry/wg-app-runtime-platform-logging-and-metrics-reviewers

@beyhan beyhan requested review from a team, ChrisMcGowan, ameowlia, beyhan, rkoster and stephanme and removed request for a team June 30, 2025 08:51
@beyhan beyhan added toc rfc CFF community RFC labels Jun 30, 2025
Comment on lines +45 to +49
### Configuration Flow

- When an app binds a syslog drain, the Cloud Controller should include org and space GUIDs in the drain metadata.
- The system retrieves a Certifacte Authority for that org or space from a binding.
- The drain connection must use this certificate for TLS/mTLS authentication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some words about the backwards compatibility of this flow with the current flow to create and consume a syslog drain.


## Summary

This RFC proposes to introduce support for **organization- and space-scoped client certificates** for Cloud Foundry Loggregator syslog drains using mutual TLS (mTLS), covering both **HTTPS** and **syslog+TLS** protocols. By issuing certificates at the org or space level instead of per application, this initiative will simplify certificate lifecycle management, enable centralized rotation, and facilitate integration with central certificate authorities. The change targets the reduction of operational overhead and the enhancement of tenant-level security.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "By issuing certificates at the org or space level instead of per application". Do you mean for the syslog drain use case? The https://docs.cloudfoundry.org/devguide/deploy-apps/instance-identity.html feature is used in different use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is only for the certificates used for Syslog drains and has nothing to do with the instance identity certificates.

@ZPascal ZPascal changed the title feat: Add the org and space based certificates for syslog drains rfc feat: Add the org and space based certificates for syslog drains Jun 30, 2025
Copy link
Contributor

@rkoster rkoster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will exiting connections drained and reestablished after an updated certificate?

Copy link
Member

@stephanme stephanme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add sections with proposed changes per CF component (e.g. Cloud Controller, loggregator, CF CLI). This helps to better understand the consequences of this RFC and required efforts. It also makes it easier to invite the affected WG areas for a review.

```bash
cf create-user-provided-service SPACE-NAME -p '{"ca":"-----BEGIN CERTIFICATE-----\nMIIH...-----END CERTIFICATE-----", "cert":"-----BEGIN CERTIFICATE-----\nMIIH...-----END CERTIFICATE-----","key":"-----BEGIN PRIVATE KEY-----\nMIIE...-----END PRIVATE KEY-----"}'
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These example configurations show the creation of user provided service instances that happen to contain certificates. How does it relate to syslog drains as documented in https://docs.cloudfoundry.org/devguide/services/log-management.html ?

Can you provide a complete example that shows e.g. how multiple apps in a space use the same space-scope certificate for a syslog drain binding and maybe even how you would rotate it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanme The call above will create credentials which have to be bound to the app. The problem with this approach is that these credentials are available to the app via the VCAP_SERVICES and will not be available to the Syslog Agent when creating the Syslog Drain. If this is accepted an app will have to somehow share the credentials with the Syslog Agent which will be hard to do as the Syslog Agent is a special kind of user provided service implemented as part of CF and not via an external service broker. An external broker could have collected all needed credentials and open a connection, but this is not the case.

@rkoster rkoster moved this from Inbox to In Progress in CF Community Jul 1, 2025
@beyhan beyhan requested a review from Gerg July 3, 2025 07:50
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we don't need this change. What we need is a better documentation how to achieve this with the current way how syslog drains are created and work and the other CF tools we have at hand.

@chombium
Copy link
Contributor

chombium commented Jul 4, 2025

I understand the problem and complexity of managing certificates which this RFC tres to address.

At the moment, with the current implementation, one can create a single user provided service defining a Syslog Drain and bind it to multiple apps.

flowchart LR

app1["app 1"]
ups["User Provided Service"] --> b1["service binding 1"] --> sl1["Syslog Drain 1"]
app1 --> b1
app2["app 2"]
ups["User Provided Service"] --> b2["service binding 2"] --> sl2["Syslog Drain 2"]
app2 --> b2
app3["app 3"]
ups["User Provided Service"] --> b3["service binding 3"] --> sl3["Syslog Drain 3"]
app3 --> b3
appn["app n"]
ups["User Provided Service"] --> bn["service binding n"] --> sln["Syslog Drain n"]
appn --> bn
Loading

This way one can share the same Syslog Drain certificates between apps.

We've also used to have a cf drain CLI plugin for managing single and space drains which we've deprecated few years ago because of low usage.

The Syslog Drains are created as user provided services and hence are scoped to a space. One can share a service between spaces in an org with cf share-service and with that if the service is shared among all of the spaces in an organization it will have an organization wide syslog drain.

The only possible downsides of this approach are:

  • one user provided service Syslog Drain per Syslog Drain URL. From my personal experience, I have rarely seen that a single organization uses different urls for their syslog drains
  • possible duplication if the same certificates (ca, cert and key) are used. IMO, we should not stress much about few more bytes in the Cloud Controller's database

This RFC:

  • opens a possibility to reuse credentials with different Syslog drain URLs which practically means that different client connections can be identified with the same client certificates which is a bad security practice
  • breaks the cf cups call as the command for creating Syslog Drain a specific -l parameter followed by the Syslog drain url should be used. cf cups mydrain -l https://my-drain.example.com -p {...}

IMHO, we don't need this change. What we need is a better documentation how to achieve this.

Here is an example of the whole flow:
Let's say we have an organization org1 with two spaces, space1 and space2 and an org2 with space3. There is an app1 in the space1, app2 in space2 and app3 in space3.

# target org1 and space1
cf target -o org1 -s space1

# create the syslog drain
cf cups mydrain -l https://my-drain.example.com -p {...}

# bind app1 to mydrain
cf bind-service app1 mydrain

# share the service with space2
cf share-service mydrain -s space2

# share the service with space3 in org2
cf share-service mydrain -s space3 -o org2

# target space2
cf target -s space2

# bind app2 to mydrain
cf bind-service app2 mydrain

# target org2 space3
cf target -o org2 -s space3

# bind app3 to mydrain
cf bind-service app3 mydrain

@beyhan beyhan requested review from cweibel and removed request for ChrisMcGowan July 15, 2025 05:56
@stephanme
Copy link
Member

Based on the discussion above I suggest to close this RFC. There is no need for an RFC to improve the documentation :-)

@beyhan
Copy link
Member

beyhan commented Jul 15, 2025

@ZPascal will be great if the outcome of this is to improve the existing docs.

@chombium
Copy link
Contributor

Hi,

I did few more tests of the things that I've written previously and found out that I was wrong about service instance sharing and have mixed "normal" platform/CF Market provided services with user-provided-services :-/ The diagram about sharing one/reusing an existing service instance is still valid in a single space.

I've tried two more things:

  1. Share a user-provided-service instance with cf share-instance call and found out that the user provided services cannot be shared. The error comes from the following Cloud Controller's validation.
  2. Add a service key to a user-provided-service and saw that that's also not supported. The error was Binding parameters are not supported for user-provided service instances and comes from the Cloud Controller's service_credential_bindings_controller

This means that the RFC is still valid.

IMO, as the Syslog Drains are only one type of user-provided-services we should definitely not change the behavior for all user-provided-services or even worse, add special handling for Application Syslog Drains.

The only way forward with this RFC which I see, is to adjust the whole Syslog Drain creation process via the cf create-user-privided-service, so that we add a parameter which will define if the drain is valid for a space or an organization.

  • A cf user would call something like cf create-user-provided-service -l "https://mydrain.com:5678 -p {"ca"":"ca_cert", "cert":"cert_data", "key":"key_data", "drain-validity":"org"}

  • drain_validity is the "code name"(we need to find something better) will be either "space" or "org". The default value will be "" to keep the things backward compatible.

  • The Cloud Controller should parse the value of drain_validity and:

    • "" empty - don't create service bindngs
    • "space" - create bindings for all apps in the space
    • "org" - create bindings for all apps in the org

The Cloud Controller's syslog drain url controller will be unchanged.

The Syslog Agent won't be adjusted as all the heavy lifting will be done by the Cloud Controller. The only possible thing to adjust would be error handling.

There are three major challenges with this approach:

  1. The Cloud Controller will have to follow what's happening with the apps (push, delete and similar) in a space or org and what's happening with the service instance(create, update, delete) and create, delete or update many service bindings at once depending on the number of the apps.
  2. As the Syslog Drain validation is done later in the Loggregator's Syslog Agent, it may happen that the Cloud Controller will have to create many bindings which won't work.
  3. The Syslog Agent will create a Syslog Drain for each application in a space or an org. Making exceptions for a specific app might be useful. An app dev should be able to create a separate Syslog Drain for a particular app and don't use the space or org Syslog Drain. This could be done by deleting the binding for the app, but it will complicate the things in the Cloud Controller with some sort of an exclude list.

Whatever we do, there will be some non-trivial changes in the Cloud Controller needed. We should keep the implementation effort and the complexity in mind and the added value for the app devs. This is a nice to have feature, but I cannot estimate it's possible adoption and usage. Tbh, I understand that having org/space Syslog drains will reduce the management effort for the app devs, but based on the implementation effort, I'm not totally convinced that we need this. What do you think about this? It would be great to hear from someone from the CAPI team.

We have one issue about org and space Syslog drains from @Benjamintf1, the former lead of the CF Logging and Metrics Team. I don't know if he had some other ideas about the implementation which he would like to share...

@chombium
Copy link
Contributor

I had a chat with @stephanme today, and we've concluded that my suggestion is not acceptable, as the Cloud Controller should not process and do something based on the values of the service credentials values passed with the -p parameter in the cf cups call.

We see two possible ways how to proceed with this:

  1. Discuss if making user-provided-services is possible. I've tracked down the change to this commit. It would be good if someone who would explain us why is the sharing prohibited and if we can change that.
  2. Leave the Cloud Controller it its current state as it is and build a service broker which will manage Syslog Drains and their credentials on a higher level. It will be a bit more work, but we'll have full control of the Syslog Drain user-provided-service creation and we can add validation of the parameter values upfront and not at the end when the Syslog Agent creates the connections to the remote Syslog servers.

@beyhan
Copy link
Member

beyhan commented Jul 22, 2025

Hi @chombium,

My opinion about the listed options in your last comment is:

  1. The service instance sharing discussion for UPS is a more generic topic which could be addressed in a different RFC or change request. The challenge described in this RFC could be one argument to look into that topic.
  2. I don't think that we need a RFC to discuss the implementation of a service broker for Syslog Drains forwarding. That is a decision in the scope of the ARP WG.

@stephanme
Copy link
Member

I don't think that there is a blocker making UPS instances sharable or support service keys. It may become more complex to find all bindings that need to be updated when the UPS instance is updated. Bindings to a UPS are updated directly when the credentials of the UPS are updated - in contrast to service bindings to regular service instances where a new binding has to be created to update credentials.

I think this would be a good step towards reducing the differences of user-provided services and manage services. We had a similar discussion in the [RFC] Service Credential Binding Rotation for Apps

@ZPascal
Copy link
Contributor Author

ZPascal commented Jul 30, 2025

Thank you all very much for your valuable input and constructive contributions to this discussion. We agree with the suggestion from @stephanme and will move forward with exploring the shareability of User-Provided Services (UPS) as a next step. If needed, we will open a dedicated RFC to address the open questions and implementation details together with the community.

Enabling shareability for UPS would also help to generalize and align this functionality more closely with managed services, further reducing the differences between the two.

As the main points have been addressed here, we will close this RFC for now. We sincerely apologize for the delayed response and appreciate your patience and engagement throughout this process.

Thanks again to everyone who contributed their time and ideas — your input is greatly appreciated!

@ZPascal ZPascal closed this Jul 30, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in CF Community Jul 30, 2025
@ZPascal ZPascal deleted the org-space-based-certificates-for-syslog-drains branch July 30, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc CFF community RFC toc

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants