Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first version of alb cache #176

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Conversation

NicolasViaud
Copy link

Draft PR

#109

Description of changes:
Proposition for an AwsAlbJwksCache implementation.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ottokruse
Copy link
Contributor

THANKS for your work so far!

I have a preference to do the JWKS URI thing slightly different: let’s keep the file jwk.ts and the type JwksParser unchanged from how they were. Then, let’s treat the full URL to a kid as the JWKS uri, so https://public-keys.auth.elb.eu-west-1.amazonaws.com/abcdefghijklmnop123--> let’s treat such a URL as a JWKS uri, for a JWKS with just 1 kid in it.

This will entail we cannot re-use the SimpleJwksCache (I think) and we should build something else (similar but different), a cache that has an LRU strategy of size 2 for JWKS uris.

For verifying, make sure we check:

  • signer --> this must match the ALB arn provided by the developer
  • iss --> this must match the User Pool provided by the developer (same as CognitoJwtVerifier checks that)
  • clientid --> this must match the User Pool Client ID provided by the developer (same as CognitoJwtVerifier checks that)

@NicolasViaud
Copy link
Author

NicolasViaud commented Oct 1, 2024

No worries, and thank for leading this feature!

Ok to keep the AlbJwksParser as is.

And Ok for reimplemeting the AwsAlbJwksCache without depending of the SimpleJwksCache. I didn't really like neither the AwsAlbJwksCache "URI expand" mechanism to stick to the SimpleJwksCache implementation.
We just have to keep in mind that we will have to reuse the PenaltyBox feature like the SimpleJwksCache.

If I understand well your previous comment, you would prefer having the cache looking like the example below (with a LRU strategy):

Key (=kid) Value (JWKS with only one JWK)
1111-2222-3333 { key:[alg:RS256,kid:1111-2222-3333,...]}
4444-5555-6666 { key:[alg:RS256,kid:4444-5555-6666,...]}

To clarify and keep writing record why I believe it's unnecessary to include the region value in the cache key, here's my reasoning:

  • An instance of an application behind an ALB, using this library to be protected, can't receive token from other regions because the ALB is regional (the ALB can forward requests to target groups only in its region)
  • The kid seems to be a UUID that is unique by definition, and 2 differents JWKS in 2 differents regions can't have the same kid

I just wanted to confirm that we are align before starting updating the code.

@ottokruse
Copy link
Contributor

Thanks mate!

Ok so before I was thinking this would be the contents of the LRU cache (sample, it may have 0, 1 or max 2 entries like here):

Key (=JWKS URI with kid) Value (JWKS with only one JWK in the keys array)
https://public-keys.auth.elb.eu-west-1.amazonaws.com/1111-2222-3333 { "keys":["alg":"RS256","kid":"1111-2222-3333",...]}
https://public-keys.auth.elb.eu-west-1.amazonaws.com/4444-5555-6666 { "keys":["alg":"RS256","kid":"4444-5555-6666",...]}

But you're probably right that you only need key id as cache key though. On the other hand, is there a benefit to doing that? Does that make your implementation simpler?

Note that this choice should be internal to the AwsAlbJwksCache anyway, because the generic interface for JwksCache is:

export interface JwksCache {
  getJwk(jwksUri: string, decomposedJwt: DecomposedJwt): Promise<JwkWithKid>;
  getCachedJwk(jwksUri: string, decomposedJwt: DecomposedJwt): JwkWithKid;
  addJwks(jwksUri: string, jwks: Jwks): void;
  getJwks(jwksUri: string): Promise<Jwks>;
}

So the AwsAlbJwksCache must implement all these methods. I think here, jwksUri must be the base path without kid, so one of these formats (per the docs):

But since that does not work as cache key for us in this case, we need to either add kid to the url, or just use kid as the internal cache key (as you suggest).

Sorry for the rambling, hope this helps.

@ottokruse
Copy link
Contributor

Idea to make headway, let's work backwards. Let's first now only design the interface of AwsAlbJwtVerifier and AwsAlbJwksCache.

First stab:

import { AwsAlbJwtVerifier } from "aws-jwt-verify";

const verifier = AwsAlbJwtVerifier.create({
  issuer: "https://cognito-idp.{region}.amazonaws.com/{userPoolId}",
  clientId: "{clientId}",
  loadBalancerArn: "{loadBalancerArn}",
  jwksUri: "https://public-keys.auth.elb.region.amazonaws.com", // optional, can default if off of the loadBalancerArn
});

Mmm I now realize this already makes the choice to keep the interface generic over IDPs, and not take a userPoolId parameter. Well, may be simplest at least to start like this.

@ottokruse
Copy link
Contributor

Thoughts?

@NicolasViaud
Copy link
Author

NicolasViaud commented Oct 3, 2024

Oh, I am just learning about the AWS US gov cloud, thank you for sharing this knowledge.

My thoughts about the AwsAlbJwtVerifier interface that you have proposed:

  • clientId: OK
  • loadBalancerArn: Good idea in order to check the signer field
  • jwksUri: Ok for being optional. In that case, we will have to parse the loadBalancerArn (or userPoolId) to extract the region, and assume that the developer is hosted on the normal cloud (not the gov cloud)
  • issuer: I am not sure to understand the reason why we can't just have the userPoolId like the CognitoJwtVerifier (Ex of userPoolId: eu-west-1_XXXXX). Even if there is an IDP federation with cognito, the issuer will remain https://cognito-idp.{region}.amazonaws.com/{userPoolId} and we can easely generate this URI. And the logic will remain the same than CognitoJwtVerifier

Except the little detail ont the issuer parameter, I agree with your interface proposal.

@NicolasViaud
Copy link
Author

Finally after more thoughts, you were right, we should keep the issuer as an URI for the use case when the ALB is configured with an custom OIDC Authorization server rather than cognito. So, I 100% agree with your interface proposal

@ottokruse
Copy link
Contributor

ottokruse commented Oct 3, 2024

Yeah and I dont want to have to create a AwsAlbJwtVerifier and a AwsAlbCognitoJwtVerifier :)

Ok let's stick to that interface for now then. Maybe we should not have the jwksUri param, as we can determine it off of the ALB arn anyway. But I suspect that it may come in handy to have the ability to override the default value though.

How about the interface of the AwsAlbJwksCache? Proposal:

const cache = new AwsAlbJwksCache({
    fetcher: myCustomFetcher // optional, default to the same fetcher that is used by `SimpleJwksCache`
});

I don't think we need a jwksParser parameter as the SimpleJwksCache has, because the AwsAlbJwksCache is already specific for AWS ALB JWTs. Same reason for penaltyBox, I don't think we need to support overriding it, so let's not add that as a param to AwsAlbJwksCache.

About the 4 methods it should expose (per the interface JwksCache, seems consistent to use this again although the AwsAlbJwksCache is specific to AWS ALB implementation anyway):

getJwk(jwksUri: string, decomposedJwt: DecomposedJwt): Promise<JwkWithKid>

Uses the cached JWK is available, otherwise fetches it and caches it.
For fetching, it takes the (base jwksUri) and appends the kid from the decomposedJwt.

getCachedJwk(jwksUri: string, decomposedJwt: DecomposedJwt): JwkWithKid

Same as getJwk but raises an error if the Jwk is not in the cache

addJwks(jwksUri: string, jwks: Jwks): void;

Add a JWKS to the cache. I think it may be helpful to actually implement this (vs throwing a NotImplemented error). If only for testing scenarios where you don't want the test code to actually get the JWK over the internet from the ALB URL.
It should throw an error though if the JWKS has more than 2 JWKs?

getJwks(jwksUri: string): Promise<Jwks>;

This we cannot implement, because we need the kid, i.e. from the decomposedJwt. We could change this interface, but for now I vote to instead just throw a NotImplemented error or so, to keep things simple.

@NicolasViaud
Copy link
Author

Hi Otto,
To stick to the incremental approach, I have updated and commit a new part of the code.
I have kept the alb version with the SimpleJwksCache and added a new version from scratch. We will be able to compare more easely the code between these 2 versions. I am just not so sure finally about the LRU feature because it's a relative "complex" mechanism, but overall, only 1 or maximum 2 JWK will be cached by instance.

Anyway, this is not the final version of the code. If you validate the code direction, here are the missing parts:

  • add unit test for albv2
  • manage the fetch error case for albv2
  • remove assert from cache
  • AND the most important, implement the AlbVerifier

@ottokruse
Copy link
Contributor

To stick to the incremental approach, I have updated and commit a new part of the code.

Thanks, and that's great!

Mmmmm I see what you mean with "complex". I think instead of a linked list to implement the LRU cache we can also use a JS Map, because it stores keys in insertion order, so you can always get the oldest key to remove by doing const oldestKey = this.cache.keys().next().value;. Using a JS Map will likely be more simple?

Let's also add the Penaltybox now. That will also help figuring out the right cache key --> I believe it should be the JWKS uri without kid, because it must not be under attacker control and the kid is under attacker control

@NicolasViaud
Copy link
Author

NicolasViaud commented Oct 13, 2024

Thank you Otto for the JS Map order tips. In fact, the LRU cache implementation is much more simplier by using the natural Map order (I have commited the new LRU cache implementation).

For the reflexion about what would be a usefull and safe key cache, here is my thought:
If we are taking the hypothesis that an attacker can control the kid by sending to the backend, a forged JWT token, the question is how he could exploit it ?

  1. Implementation of cache with the key corresponding to the JWKS URI with kid: an attacker could spam requests with a forged JWT token having a new kid value each time. The penality box wont be effective because it's correspond to a legitimate new request. The impact of such an attack could potentially lead to a Denial of Service by saturating the memory and CPU of the backend.
  2. Implementation of cache with the key crresponding to the JWKS URI without kid: if an attacker spam requests before that the server coul retrieve the legitimate JWK, the penality box will continuesly add a timeout, and the legitimate request will never be able to go throw. It can lead to a Denial of Service also

However, I think there is a possible mitigation. We need to remember that this is an ALB feature that we are developing. In term of architecture, it's the ALB that will send the JWT token. The backend, if the infrastructure is clean, won't be accessible from outside (security group, private subnet, ..). It means that it's impossible for an attacker to send directly any forged token to the backend (again, only if the infrastructure is clean).

In conclusion, for me, the solution 1 or 2 are acceptable. And personally, the solution 1 make more sens to me.

And of course, in any case, we will have to validate that the kid is at the good format (UUID) if I remeber well to avoid URI injection.

@ottokruse
Copy link
Contributor

Think you're right. It's a DOS potential either way, and saturating the memory and CPU will be a lot harder to pull off than sending in a non-existing kid every X seconds.

But then this means, we don't need or want to add a penaltybox. Let's add the reasoning as a comment in the code then.

@NicolasViaud
Copy link
Author

NicolasViaud commented Oct 27, 2024

Hi Otto,
I am working on the AlbVerifier this time, in order to see the integration between the AlbVerifier and the AlbCache.
To start reasoning, let's draw an architectural diagram first, from a non common use case (but maybe existing):

architecture-beta
    group aws(cloud)[AWS]
    group public[Public subnets] in aws
    group private[Private subnets] in aws
    service user(internet)[user]
    service as(server)[Cognito OR OIDC AS] in public
    service alb1(server)[ALB 1] in public
    service alb2(server)[ALB 2] in public
    service api(server)[Server protected with aws jwt verifier] in private

    user:R --> L:alb1
    user:R --> L:alb2
    alb1:R --> L:api
    alb2:R --> L:api
    alb1:R -- L:as
    api:B -- T:as

Loading

(Some link are missing like between ALB2 and the AS, or between the user and the AS but the render was messy with them)

There is a specificity with this architecture, compare to the classic architecture with several AS: the data token received by the server can have 2 signers (ALB1 and ALB2) and the same issuer (Cognito or AS OIDC).

Ex of token (obfuscated data token extract from a real case, with 2 ALBs authenticated with the same Cognito User Pool):

{
    "header": {
        "typ":"JWT",
        "kid":"keyId",
        "alg":"ES256",
        "iss":"https://cognito-idp.us-west-2.amazonaws.com/us-west-2_123456",
        "client":"clientId",
        "signer":"arn:aws:elasticloadbalancing:us-west-2:123456789012:loadbalancer/app/AAAAAAAAAAA/AAAAAAAAAAA",
        "exp": 1516239022
    },
    "payload": {
        "sub": "1234567890",
        "email": "",
        "username": "1234567890",
        "exp": 1516239022,
        "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_123456"
    }
}

Data token received by the server, when the user request was sent to the ALB1

{
    "header": {
        "typ":"JWT",
        "kid":"keyId",
        "alg":"ES256",
        "iss":"https://cognito-idp.us-west-2.amazonaws.com/us-west-2_123456",
        "client":"clientId",
        "signer":"arn:aws:elasticloadbalancing:us-west-2:123456789012:loadbalancer/app/BBBBBBBBBBB/BBBBBBBBBBB",
        "exp": 1516239022
    },
    "payload": {
        "sub": "1234567890",
        "email": "",
        "username": "1234567890",
        "exp": 1516239022,
        "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_123456"
    }
}

Data token received by the server, when the user request was sent to the ALB2

Only the signer header parameter changed.

The question is, do we want to to have a MultiProperty feature for the AlbJwtVerifier, like with other verifiers ?
I would like to answer, yes !

If yes, when multi properties are declared, each parameter correspond to an AS or ALB ?
I would like to answer that each parameter should correspond to an ALB if we want to be compatible with the architecture diagram above

What are your thought about these 2 questions please ?

Below, the hypothesis is that your answer are the same than mine

The JwtVerifierbase works well when there is several issuers to manage multi user pool properties for example. If we want that each property correspond to an ALB, we need to use the field signer/loadBalancerArn to index property (instead of issuer).

I can see 2 options:

Option 1: develop the AlbJwtVerifier by extendening the JwtVerifierBase (like CognitoJwtVerifier)

I think this solution can work, but I haven't really tested it.
This solution is really similar to the CognitoJwtVerifier. This solution is possible by redefining the protected getVerifyParameters function.

Pros:

  • Reusing the JwtVerifierBase

Cons:

  • Try to fit in the feature developed for multi issuer
  • Still some code duplicate (Like CognitoJwtVerifier)
  • Can't inforce loadBalancerArn parameter non null
  • Function hydrate will throw an error NotImplemented

Option 2: develop the AlbJwtVerifier from scratch

The development from scratch is not too diffucult. It will be really similar to the CognitoJwtVerifier.
The JWT token verification will be done by calling the already existing function verifyDecomposedJwt and verifyDecomposedJwtSync from "./jwt-verifier"

Pros:

  • Easier to develop specific code
  • Function hydrate not defined

Cons:

  • A little bit more code duplicated than option 1: functions getAlbConfig, expectedLoadBalancerArn, verifyDecomposedJwt, verifyDecomposedJwtSync are duplicated and inspired from JwtVerifierBase

Note

I am currently trying to understand how KeyObjectCache works, because these 2 options use it and it seems to be really linked to the multi issuer feature

@ottokruse
Copy link
Contributor

Thanks for the work, and laying out the options! Super helpful.
Agree if we can do a multi ALB verifier that would be great, and consistent with the library.

Option 1 would be "most consistent" and the only reason my brain was favoring option 2 (from scratch / separate) was because I thought it would be simplest. But if you can see how feasible option 1 is, maybe it's doable enough.

KeyObjectCache was meant to cache the creation of a Node.js native KeyObject from the JWK, so you wouldn't need to do this over and over (which you would if you would just cache the JWK). However it also stems from a time when Node.js crypto didn't support using JWKs natively, which is why we had the custom ASN1 encoder in pure JS. Maybe we should get rid of KeyObjectCache now, if it adds complexity without meaningful speedup.

@ottokruse
Copy link
Contributor

The question is, do we want to to have a MultiProperty feature for the AlbJwtVerifier, like with other verifiers ?
I would like to answer, yes !

me too

If yes, when multi properties are declared, each parameter correspond to an AS or ALB ?
I would like to answer that each parameter should correspond to an ALB if we want to be compatible with the architecture diagram above

me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants