Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-257: Add Open ID Connect Support to Server Components #19771

Closed
michaeljmarshall opened this issue Mar 9, 2023 · 13 comments
Closed

PIP-257: Add Open ID Connect Support to Server Components #19771

michaeljmarshall opened this issue Mar 9, 2023 · 13 comments
Assignees
Labels
Milestone

Comments

@michaeljmarshall
Copy link
Member

michaeljmarshall commented Mar 9, 2023

Discussion: https://lists.apache.org/thread/ttdh74t6v5nznmtw436otk8qdymjvrfk
Vote: https://lists.apache.org/thread/h2d48g7s58w1n6426hxlv9kwrj09kczz
Status: accepted

Motivation

Apache Pulsar does not yet support a server side AuthenticationProvider that implements the Open ID Connect spec for a relying party as defined by https://openid.net/connect/. The only token based authentication is provided via the AuthenticationProviderToken class. Given that we already have clients that implement the OAuth2.0 protocol, which integrates easily with an Open ID Connect AuthenticationProvider, it would be very helpful to add this support to the Pulsar Server components.

Goal

In implementing the OIDC spec, we will fulfill both the core (https://openid.net/specs/openid-connect-core-1_0.html) and the discovery (https://openid.net/specs/openid-connect-discovery-1_0.html) portions of the spec in the AuthenticationProvider implementation.

The end result will be a plugin that:

  • supports multiple token issuers

  • retrieves the JWKS uri for each issuer from the token issuer's /.well-known/openid-configuration endpoint

  • retrieves and caches the JKWS when a client attempts to connect using a token issued by one of the trusted issuers

  • refreshes the JWKS after a configured amount of time, which allows for seamless key rotation without needing to restart the proxy, broker, function worker, websocket proxy. (Restarts are still needed to mitigate problems like leaked private keys.)

  • verifies that a token's signature and claims are valid

API Changes

There will be two new public classes:

  1. Implement AuthenticationProvider with AuthenticationProviderOpenIDConnect.
  2. Implement KubernetesFunctionAuthProvider with KubernetesSecretsAuthProvider.

Implementation

Add a module to the apache/pulsar repo named pulsar-openid-connect where we will implement the logic for AuthenticationProviderOpenIDConnect. The core additions will include the ability to discovery the JWKS URI for each token issuer, following the Open ID Connect Discovery protocol, then retrieving and caching the JWKS.

Create a class named KubernetesSecretsAuthProvider that mounts a pre-existing secret into the Pulsar Function pod. This class serves two purposes. First, the function worker does not need to have the credentials used by the function pod. Second, the current authentication flow in the function worker is to forward the authentication data used to create the function. Because OpenID Connect supports short lived tokens, it is not valid to assume that the function pod can operate with the supplied authentication data. Instead, a user will be able to configure the function pod with the client id and the client secret necessary to retrieve the authentication token from the OAuth 2 Authorization Server.

Alternatives

There was an initial attempt to implement this feature here: #11794. The PR was never completed, so it is closed now.

Anything else?

We will need to address the proxy authentication handling in order for users to avoid encountering some of the issues documented here #10816. I plan to follow up with a fix for this issue.

@EronWright
Copy link
Contributor

EronWright commented Mar 10, 2023

One of the practical use cases for this is in combination with Kubernetes Service Account Token Projection. We could configure Pulsar to accept OIDC tokens from one or more Kubernetes clusters, based on the discovery feature.

This would allow a Pulsar function pod to authenticate based on their Kubernetes Service Account (KSA) using the standard token plugin. Kubernetes automatically mounts an OIDC token file into the pod, and rotates the token to keep it fresh. I believe that the token plugin reloads the token file automatically. In this scenario, there's no need for KubernetesSecretsAuthProvider.

@michaeljmarshall may I suggest that the new plugin be tested for compatibility with this feature of Kubernetes. In practice, it's very easy to use. There's one caveat that I'm aware of: the token's iss claim should be validated against the value of the issuer field of the disco document, as opposed to a configured value.

Here's a snippet showing how the Kubernetes API server exposes an OIDC discovery endpoint:

$ curl -k https://kubernetes.default.svc/.well-known/openid-configuration
{
    "issuer": "https://container.googleapis.com/v1/projects/xyz/locations/us-central1/clusters/test-oidc",
    "jwks_uri": "https://35.223.170.12:443/openid/v1/jwks",
    "response_types_supported": [
        "id_token"
    ],
    "subject_types_supported": [
        "public"
    ],
    "id_token_signing_alg_values_supported": [
        "RS256"
    ]
}

Here's how a function pod may acquire an OIDC token for a certain audience (i.e. the Pulsar cluster):

apiVersion: v1
kind: Pod
metadata:
  name: my-pulsar-function-0
  namespace: pulsar
spec:
  serviceAccountName: my-pulsar-function
  containers:
  - name: default
    volumeMounts:
    - name: pulsar-serviceaccounttoken
      mountPath: /var/run/secrets/pulsar/serviceaccount
  volumes:
   - name: pulsar-serviceaccounttoken
     projected:
       sources:
       - serviceAccountToken:
         audience: my-pulsar-cluster
         expirationSeconds: 3600
         path: token

@michaeljmarshall
Copy link
Member Author

@EronWright - great suggestions, thank you for sharing! I was actually just looking into the k8s service accounts feature earlier today, and I completely agree that we should add support to make the function pods easily integrate with them. It seems like a great way to make it easier to leverage OIDC without needing to run your Authorization Server.

One reason I chose not to reference the k8s service account feature in this proposal is because these docs (https://kubernetes.io/docs/concepts/security/service-accounts/#authenticating-in-code) indicate that it is recommended to use the TokenReview API (https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-review-v1/) instead of OIDC because the TokenReview API prevents the usage of tokens generated for pods that are terminated.

That being said, relying on the TokenReview API seems like it would introduce latency and could couple the broker to the API Server in an unnecessary way. A short time to live could also limit how useful a token is after a pod terminates.

Another option could be to use the TokenReview API and then to fall back to the OIDC Authentication Provider to allow for better availability in the event of network connectivity issues when connecting to the api server.

What do you think?

@EronWright
Copy link
Contributor

I agree that first-class support for "kubernetes" authentication on both the client and broker would be cool, but the beauty of the OIDC route is three-fold: it teaches the broker to interoperate with many identity providers in a common way, it works well with the token authentication client plugin, which is universally supported in the Pulsar ecosystem, and it works with long-running applications (e.g. functions and flink jobs).

Also, one can control the token TTL in the function pod spec, so you can limit the exposure.

@michaeljmarshall
Copy link
Member Author

Great points about the integration. At this point, I think it makes sense to skip the integration with the TokenReview API. The design could be such that a future addition could add it if deemed necessary/valuable.

Also, one can control the token TTL in the function pod spec, so you can limit the exposure.

Makes sense. Are you thinking we should let the function worker create the service accounts? I hadn't considered that option. In your opinion, how configurable should that be? Creating service accounts requires an increased permission on the function worker within the k8s cluster, which introduces a new risk because a function worker could then potentially be used to create service accounts with excessive permissions.

@michaeljmarshall
Copy link
Member Author

Note: for functions, they will use the SerializableURITokenSupplier for the AuthenticationProviderToken. This supplier is automatically used when the file: prefix starts to the token path, and it results in reading the file from the filesystem each time the token is needed (including when the broker challenges the client upon token expiration).

@michaeljmarshall
Copy link
Member Author

While testing this out in Azure, I discovered that the azure provider requires authentication to retrieve the public keys while AWS does not. That will mean we need to modify #19849 to allow an issue to have authentication. Is it common to protect this kind of endpoint with authentication?

@michaeljmarshall
Copy link
Member Author

michaeljmarshall commented Mar 21, 2023

For future reference, I checked out some implementations, and below are the claims sections of the JWT created when you specify the expiration and not the audience.

They all appear the same, which is good. I haven't found any k8s documentation that they will appear this way, but it makes sense. One challenge we'll face by integrating with k8s is making sure that we only give permission to the correct pods. This might be challenging. I'll be thinking about this some more.

A sample token from Azure Kubernetes service:

{
  "aud": [
    "https://some.info.azmk8s.io",
    "\"some.info.azmk8s.io\""
  ],
  "exp": 1679444659,
  "iat": 1679437459,
  "iss": "https://some.info.azmk8s.io",
  "kubernetes.io": {
    "namespace": "michael-test",
    "pod": {
      "name": "nginx",
      "uid": "6ad4f05d-64a9-44c7-9acd-fa2f34eb6c96"
    },
    "serviceaccount": {
      "name": "default",
      "uid": "4e618f02-2ce3-45a4-b623-fcc806a16cdf"
    }
  },
  "nbf": 1679437459,
  "sub": "system:serviceaccount:michael-test:default"
}

Sample from EKS:

{
  "aud": [
    "https://kubernetes.default.svc"
  ],
  "exp": 1710969822,
  "iat": 1679433822,
  "iss": "https://oidc.eks.us-east-2.amazonaws.com/id/some-id",
  "kubernetes.io": {
    "namespace": "michael-test",
    "pod": {
      "name": "nginx",
      "uid": "fbac8f9e-a47d-4ad7-a8f0-cc9a65d1331c"
    },
    "serviceaccount": {
      "name": "default",
      "uid": "5964f9d3-3dce-467c-8dbe-d0f463063d7a"
    },
    "warnafter": 1679437429
  },
  "nbf": 1679433822,
  "sub": "system:serviceaccount:michael-test:default"
}

GKE:

{
  "aud": [
    "https://container.googleapis.com/v1/projects/some-info"
  ],
  "exp": 1710973269,
  "iat": 1679437269,
  "iss": "https://container.googleapis.com/v1/projects/some-info",
  "kubernetes.io": {
    "namespace": "michael-test",
    "pod": {
      "name": "nginx",
      "uid": "b75e88ae-753b-4eb8-854e-488c04fc12bc"
    },
    "serviceaccount": {
      "name": "default",
      "uid": "32732d79-3203-4f4a-a343-69215edf1a2a"
    },
    "warnafter": 1679440876
  },
  "nbf": 1679437269,
  "sub": "system:serviceaccount:michael-test:default"
}

@EronWright
Copy link
Contributor

EronWright commented Mar 23, 2023

@michaeljmarshall sorry for the delayed response.

About the function pod's service account, I would like to share my thoughts about that. The pod's KSA can be understood as the 'workload identity' of the function. It is the identity that the function will have when talking to Pulsar, and to other cloud services too. For example, a "Cloud Storage Sink" connector (which is a type of function) would naturally authenticate to S3 using its workload identity (see IRSA).

It seems important that the function submitter not be able to escalate their privileges by selecting an arbitrary service account. I feel that the function worker service (e.g. the function mesh) should assign a service account. I would hesistate to automatically create service accounts, because of the complexity of setting up the workload identity.

@EronWright
Copy link
Contributor

On the topic of authorization: as you've shown, the sub claim conveys the name of the KSA, and that would assumedly be used as the Pulsar role name. One would then set Pulsar permissions based on the name, for example in broker.conf:

authorizationEnabled=true
superUserRoles=system:serviceaccount:michael-test:default

Using the KSA name in this way is actually quite convenient, but may benefit from an authorization plugin that would allow for group-based policies, e.g. based on the namespace claim.

Here's how it would work in a multi-k8s-cluster environment. Imagine that the broker was configured to trust the tokens from a number of Kubernetes clusters, collectively known as a cluster-set. By the principle of namespace sameness, a service account with a given name would be considered equivalent across the whole cluster-set. That's a Good Thing, in my opinion.

@michaeljmarshall
Copy link
Member Author

It seems important that the function submitter not be able to escalate their privileges by selecting an arbitrary service account. I feel that the function worker service (e.g. the function mesh) should assign a service account. I would hesistate to automatically create service accounts, because of the complexity of setting up the workload identity.

I completely agree. I had asked the question about this above, but it cannot be correct to let the client select the service account. Further, we

may benefit from an authorization plugin that would allow for group-based policies, e.g. based on the namespace claim.

Yes, I've been thinking that because the tokens have other informative claims, it might make sense to provide a more nuanced way to check authorization with a provider that understands the claims.

That cluster example is very interesting, thanks for sharing!

@michaeljmarshall
Copy link
Member Author

After additional checks, looks like this call consistently fails when no authentication is provided:

$ curl -k https://kubernetes.default.svc/.well-known/openid-configuration
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/.well-known/openid-configuration\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

(The -k is simply used for convenience.)

@michaeljmarshall
Copy link
Member Author

The following works:

curl --cacert /run/secrets/kubernetes.io/serviceaccount/ca.crt \
    -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)"  \
    https://kubernetes.default.svc/.well-known/openid-configuration

@michaeljmarshall michaeljmarshall added this to the 3.0.0 milestone Apr 6, 2023
michaeljmarshall added a commit that referenced this issue Apr 6, 2023
…#19888)

PIP: #19771  

### Motivation

In order to make OIDC work with functions, we must give them a way to authenticate with the broker using tokens that are able to be validated by an using an Authorization Server. This PR introduces the `KubernetesServiceAccountAuthProvider`.

### Modifications

* Create an `KubernetesServiceAccountAuthProvider` implementation. It adds a service account token volume projection as defined in the k8s docs [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection). The implementation provides a way to specify the expiration time that the token will receive.

* Instead of creating a secret with the broker's trusted `ca.crt` in it, this new `KubernetesServiceAccountAuthProvider` expects a secret to already exist with the `ca.crt`. The major advantage for this implementation is that when the `ca.crt` is rotated, we can refresh it (assuming the client is configured to observe the updated file).

* Add support for specifying the token's expiration and audience.

* One point of divergence from the `KubernetesSecretsTokenAuthProvider` implementation is that I did not provide a way for functions to authenticate as the anonymous role. It seems like a stretch that functions would use such authentication because it will not be multi-tenant. However, if that is a concern, we can add the support.

* The feature will be configured with the following yaml:

```yaml
functionRuntimeFactoryConfigs:
  kubernetesFunctionAuthProviderConfig:
    brokerClientTrustCertsSecretName: "secret-name"
    serviceAccountTokenExpirationSeconds: 3600
    serviceAccountTokenAudience: "pulsar-cluster-audience"
```

### Verifying this change

I verified the correctness of the code with unit tests. I'll verify the integration with k8s once we've determined this PR's design is correct.

### Does this pull request potentially affect one of the following parts:

This adds new configuration options to the function worker.

### Documentation

- [x] `doc-required`

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#36
michaeljmarshall added a commit that referenced this issue Apr 10, 2023
PIP: #19771 

### Motivation

This is the primary PR for PIP 257 (#19771). It adds an OpenID Conenct `AuthenticationProvider` implementation. The implementation is intended to be compliant with the OpenID Specs defined here: https://openid.net/developers/specs/. We specifically implement the discovery and these two:

> * [OpenID Connect Core](https://openid.net/specs/openid-connect-core-1_0.html) – Defines the core OpenID Connect functionality: authentication built on top of OAuth 2.0 and the use of claims to communicate information about the End-User
> * [OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) – Defines how clients dynamically discover information about OpenID Providers

### Modifications

* Add new module `pulsar-broker-auth-oidc`
* Add implementation that relies on auth0 client libraries to verify the signature and claims of the JWT
* Use async http client for all http requests
* Cache the provider metadata and the JWKS results
* Support different types of `FallbackDiscoveryMode`s, as documented in the code. Essentially, this setting allows users to more easily integrate with k8s. We need this coupling with kubernetes to deal with some of the nuances of the k8s implementation. Note that this part of the code is experimental and is subject to change as requirements and cloud provider implementations change. One important reason we use the k8s client is because the API Server requires special configuration for authentication and TLS. Since these do not appear to be generic requirements, the K8s client simplifies this integration. Here is a reference to the decision that requires authentication by default for getting OIDC info kubernetes/kubernetes#80724. That discussion also indicate to me that this is an isolated design decision in k8s. If we find that authentication is a generic requirement, we should easily be able to expand the existing feature at a later time.
* Add metrics to help quantify success and failure. (I had thought I would add audit logging, but that is an independent feature that we can add to the Pulsar framework. It seems outside the scope of an Authentication Provider implementation to implement this feature.)

### Verifying this change

There are many new tests to cover this new implementation. Some of the tests are unit tests while others are integration tests that rely on Wire Mock to return the public key information.

### Documentation

- [x] `doc-required` 

This feature will need new docs.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#35
@RobertIndie
Copy link
Member

This PIP has been implemented in #19849.
@michaeljmarshall I am closing it now. Feel free to reopen it and move it to the 3.1.0 milestone if any work is left for this PIP.

Demogorgon314 pushed a commit to Demogorgon314/pulsar that referenced this issue Apr 11, 2023
PIP: apache#19771 

### Motivation

This is the primary PR for PIP 257 (apache#19771). It adds an OpenID Conenct `AuthenticationProvider` implementation. The implementation is intended to be compliant with the OpenID Specs defined here: https://openid.net/developers/specs/. We specifically implement the discovery and these two:

> * [OpenID Connect Core](https://openid.net/specs/openid-connect-core-1_0.html) – Defines the core OpenID Connect functionality: authentication built on top of OAuth 2.0 and the use of claims to communicate information about the End-User
> * [OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) – Defines how clients dynamically discover information about OpenID Providers

### Modifications

* Add new module `pulsar-broker-auth-oidc`
* Add implementation that relies on auth0 client libraries to verify the signature and claims of the JWT
* Use async http client for all http requests
* Cache the provider metadata and the JWKS results
* Support different types of `FallbackDiscoveryMode`s, as documented in the code. Essentially, this setting allows users to more easily integrate with k8s. We need this coupling with kubernetes to deal with some of the nuances of the k8s implementation. Note that this part of the code is experimental and is subject to change as requirements and cloud provider implementations change. One important reason we use the k8s client is because the API Server requires special configuration for authentication and TLS. Since these do not appear to be generic requirements, the K8s client simplifies this integration. Here is a reference to the decision that requires authentication by default for getting OIDC info kubernetes/kubernetes#80724. That discussion also indicate to me that this is an isolated design decision in k8s. If we find that authentication is a generic requirement, we should easily be able to expand the existing feature at a later time.
* Add metrics to help quantify success and failure. (I had thought I would add audit logging, but that is an independent feature that we can add to the Pulsar framework. It seems outside the scope of an Authentication Provider implementation to implement this feature.)

### Verifying this change

There are many new tests to cover this new implementation. Some of the tests are unit tests while others are integration tests that rely on Wire Mock to return the public key information.

### Documentation

- [x] `doc-required` 

This feature will need new docs.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#35
michaeljmarshall added a commit that referenced this issue Apr 11, 2023
Fixes: #10816
PIP: #19771
Supersedes: #19026
Depends on: #20062

### Motivation

The Pulsar Proxy does not properly handle authentication data refresh when in state `ProxyLookupRequests`. The consequence is described in #10816. Essentially, the problem is that the proxy caches stale authentication data and sends it to the broker leading to connection failures.

#17831 attempted to fix the underlying problem, but it missed an important edge cases. Specifically, it missed the case that the `ConnectionPool` will have multiple connections when a lookup gets redirected. As such, the following problem exists (and is fixed by this PR):

1. Client opens connection to perform lookups.
2. Proxy connects to broker 1 to get the topic ownership info.
3. Time passes.
4. Client does an additional lookup, and this topic is on a newly created broker 2. In this case, the proxy opens a new connection with the stale client auth data.
5. Broker 2 rejects the connection because it fails with expired authentication.

### Modifications

* Remove some of the implementation from #17831. This new implementation still allows a broker to challenge the client through the proxy, but notably, it limits the number of challenges sent to the client. Further, the proxy does not challenge the client when the auth data is not expired.
* Introduce authentication refresh in the proxy so that the proxy challenges the client any time the auth data is expired.
* Update the `ProxyClientCnx` to get the `clientAuthData` from the `ProxyConnection` to ensure that it gets new authentication data.
* Add clock skew to the `AuthenticationProviderToken`. This is necessary to make some of the testing not flaky and it will also be necessary for users to configure in their clusters.

### Verifying this change

The `ProxyRefreshAuthTest` covers the existing behavior and I expanded it to cover the edge case described above.

Additionally, testing this part of the code will be much easier to test once we implement #19624.

### Documentation

- [x] `doc-not-needed`

### Matching PR in forked repository

PR in forked repository: the relevant tests pass locally, so I am going to skip the forked tests.
RobertIndie pushed a commit that referenced this issue Apr 19, 2023
…ist (#20132)

PIP: #12105 and #19771 

### Motivation

With the implementation of asynchronous authentication in PIP 97, I missed a case in the `AuthenticationProviderList` where we need to implement the `authenticateAsync` methods. This PR is necessary for making the `AuthenticationProviderToken` and the `AuthenticationProviderOpenID` work together, which is necessary for anyone transitioning to `AuthenticationProviderOpenID`.

### Modifications

* Implement `AuthenticationListState#authenticateAsync` using a recursive algorithm that first attempts to authenticate the client using the current `authState` and then tries the remaining options.
* Implement `AuthenticationProviderList#authenticateAsync` using a recursive algorithm that attempts each provider sequentially.
* Add test to `AuthenticationProviderListTest` that exercises this method. It didn't technically fail previously, but it's worth adding.
* Add test to `AuthenticationProviderOpenIDIntegrationTest` to cover the exact failures that were causing problems.
RobertIndie pushed a commit that referenced this issue Apr 19, 2023
…ist (#20132)

PIP: #12105 and #19771

### Motivation

With the implementation of asynchronous authentication in PIP 97, I missed a case in the `AuthenticationProviderList` where we need to implement the `authenticateAsync` methods. This PR is necessary for making the `AuthenticationProviderToken` and the `AuthenticationProviderOpenID` work together, which is necessary for anyone transitioning to `AuthenticationProviderOpenID`.

### Modifications

* Implement `AuthenticationListState#authenticateAsync` using a recursive algorithm that first attempts to authenticate the client using the current `authState` and then tries the remaining options.
* Implement `AuthenticationProviderList#authenticateAsync` using a recursive algorithm that attempts each provider sequentially.
* Add test to `AuthenticationProviderListTest` that exercises this method. It didn't technically fail previously, but it's worth adding.
* Add test to `AuthenticationProviderOpenIDIntegrationTest` to cover the exact failures that were causing problems.

(cherry picked from commit 58ccf02)
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this issue Apr 19, 2023
…apache#19888)

PIP: apache#19771

In order to make OIDC work with functions, we must give them a way to authenticate with the broker using tokens that are able to be validated by an using an Authorization Server. This PR introduces the `KubernetesServiceAccountAuthProvider`.

* Create an `KubernetesServiceAccountAuthProvider` implementation. It adds a service account token volume projection as defined in the k8s docs [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection). The implementation provides a way to specify the expiration time that the token will receive.

* Instead of creating a secret with the broker's trusted `ca.crt` in it, this new `KubernetesServiceAccountAuthProvider` expects a secret to already exist with the `ca.crt`. The major advantage for this implementation is that when the `ca.crt` is rotated, we can refresh it (assuming the client is configured to observe the updated file).

* Add support for specifying the token's expiration and audience.

* One point of divergence from the `KubernetesSecretsTokenAuthProvider` implementation is that I did not provide a way for functions to authenticate as the anonymous role. It seems like a stretch that functions would use such authentication because it will not be multi-tenant. However, if that is a concern, we can add the support.

* The feature will be configured with the following yaml:

```yaml
functionRuntimeFactoryConfigs:
  kubernetesFunctionAuthProviderConfig:
    brokerClientTrustCertsSecretName: "secret-name"
    serviceAccountTokenExpirationSeconds: 3600
    serviceAccountTokenAudience: "pulsar-cluster-audience"
```

I verified the correctness of the code with unit tests. I'll verify the integration with k8s once we've determined this PR's design is correct.

This adds new configuration options to the function worker.

- [x] `doc-required`

PR in forked repository: #36

(cherry picked from commit 067e3c0)
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this issue Apr 19, 2023
PIP: apache#19771

This is the primary PR for PIP 257 (apache#19771). It adds an OpenID Conenct `AuthenticationProvider` implementation. The implementation is intended to be compliant with the OpenID Specs defined here: https://openid.net/developers/specs/. We specifically implement the discovery and these two:

> * [OpenID Connect Core](https://openid.net/specs/openid-connect-core-1_0.html) – Defines the core OpenID Connect functionality: authentication built on top of OAuth 2.0 and the use of claims to communicate information about the End-User
> * [OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) – Defines how clients dynamically discover information about OpenID Providers

* Add new module `pulsar-broker-auth-oidc`
* Add implementation that relies on auth0 client libraries to verify the signature and claims of the JWT
* Use async http client for all http requests
* Cache the provider metadata and the JWKS results
* Support different types of `FallbackDiscoveryMode`s, as documented in the code. Essentially, this setting allows users to more easily integrate with k8s. We need this coupling with kubernetes to deal with some of the nuances of the k8s implementation. Note that this part of the code is experimental and is subject to change as requirements and cloud provider implementations change. One important reason we use the k8s client is because the API Server requires special configuration for authentication and TLS. Since these do not appear to be generic requirements, the K8s client simplifies this integration. Here is a reference to the decision that requires authentication by default for getting OIDC info kubernetes/kubernetes#80724. That discussion also indicate to me that this is an isolated design decision in k8s. If we find that authentication is a generic requirement, we should easily be able to expand the existing feature at a later time.
* Add metrics to help quantify success and failure. (I had thought I would add audit logging, but that is an independent feature that we can add to the Pulsar framework. It seems outside the scope of an Authentication Provider implementation to implement this feature.)

There are many new tests to cover this new implementation. Some of the tests are unit tests while others are integration tests that rely on Wire Mock to return the public key information.

- [x] `doc-required`

This feature will need new docs.

PR in forked repository: #35

(cherry picked from commit 11751b7)
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this issue Apr 20, 2023
Fixes: apache#10816
PIP: apache#19771
Supersedes: apache#19026
Depends on: apache#20062

The Pulsar Proxy does not properly handle authentication data refresh when in state `ProxyLookupRequests`. The consequence is described in apache#10816. Essentially, the problem is that the proxy caches stale authentication data and sends it to the broker leading to connection failures.

apache#17831 attempted to fix the underlying problem, but it missed an important edge cases. Specifically, it missed the case that the `ConnectionPool` will have multiple connections when a lookup gets redirected. As such, the following problem exists (and is fixed by this PR):

1. Client opens connection to perform lookups.
2. Proxy connects to broker 1 to get the topic ownership info.
3. Time passes.
4. Client does an additional lookup, and this topic is on a newly created broker 2. In this case, the proxy opens a new connection with the stale client auth data.
5. Broker 2 rejects the connection because it fails with expired authentication.

* Remove some of the implementation from apache#17831. This new implementation still allows a broker to challenge the client through the proxy, but notably, it limits the number of challenges sent to the client. Further, the proxy does not challenge the client when the auth data is not expired.
* Introduce authentication refresh in the proxy so that the proxy challenges the client any time the auth data is expired.
* Update the `ProxyClientCnx` to get the `clientAuthData` from the `ProxyConnection` to ensure that it gets new authentication data.
* Add clock skew to the `AuthenticationProviderToken`. This is necessary to make some of the testing not flaky and it will also be necessary for users to configure in their clusters.

The `ProxyRefreshAuthTest` covers the existing behavior and I expanded it to cover the edge case described above.

Additionally, testing this part of the code will be much easier to test once we implement apache#19624.

- [x] `doc-not-needed`

PR in forked repository: the relevant tests pass locally, so I am going to skip the forked tests.

(cherry picked from commit 075b625)
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this issue Apr 20, 2023
…ist (apache#20132)

PIP: apache#12105 and apache#19771

With the implementation of asynchronous authentication in PIP 97, I missed a case in the `AuthenticationProviderList` where we need to implement the `authenticateAsync` methods. This PR is necessary for making the `AuthenticationProviderToken` and the `AuthenticationProviderOpenID` work together, which is necessary for anyone transitioning to `AuthenticationProviderOpenID`.

* Implement `AuthenticationListState#authenticateAsync` using a recursive algorithm that first attempts to authenticate the client using the current `authState` and then tries the remaining options.
* Implement `AuthenticationProviderList#authenticateAsync` using a recursive algorithm that attempts each provider sequentially.
* Add test to `AuthenticationProviderListTest` that exercises this method. It didn't technically fail previously, but it's worth adding.
* Add test to `AuthenticationProviderOpenIDIntegrationTest` to cover the exact failures that were causing problems.

(cherry picked from commit 58ccf02)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants