aip | title | author | discussions-to (*optional) | Status | type | created |
---|---|---|---|---|---|---|
67 |
Native Consensus for JSON Web Key (JWK) |
Zhoujun Ma (zhoujun@aptoslabs.com) |
Accepted |
<Standard (Core, Networking, Framework)> |
<02/10/2024> |
OpenID Connect (OIDC) orchestrates authentication by enabling a user to prove their identity to a client application, through the mediation of a trusted identity provider, leveraging the OAuth 2.0 framework for secure interactions. Typically, this process involves verifying a signature of the provider with its cryptographic public keys, which are published in format of JSON Web Key (JWK). For security purpose, JWKs are rotated periodically, but providers may each have its own rotation schedule, and providers typically do not provide official documentation or notification: client apps are expected to fetch JWKs in an ad-hoc manner.
AIP-61: keyless accounts introduced a new type of Aptos accounts that are secured through the owner’s existing OIDC accounts (i.e., their Web2 account with an OIDC provider such as Google, GitHub or Apple), and verifying a transaction from such an OIDC account involves verifying a signature of the provider with its JWK. This requires that validators agree on the latest JWKs of every provider that needs to supported.
This AIP proposes a solution where validators:
- monitor the OIDC providers' JWKs by directly fetching them;
- once a JWK change is detected, collaborate with peers to form a quorum-certified JWK update;
- publish the update on-chain through a validator transaction.
To have a framework that replicate external JWKs on chain.
- Functional: for every provider in a given provider set (probably in an on-chain resource), ensure validators agree on its latest JWKs (and probably publish it as another on-chain resource).
- Non-functional:
- Security and decentralization: the JWKs should be agreed on by the validator network in a secure and decentralized way, ideally introduce zero extra security assumptions.
- Low replication latency: once a provider rotates its JWKs, the blockchain should pick them up as soon as possible.
- Without the latest JWKs from provider X, transactions issued by X-based keyless accounts may get rejected.
- Provider-independence: extra operation required on an OIDC provider's end should be minimized (ideally zero), so new OIDC providers can be supported at a very low cost.
- DevOps complexity: solutions with lower development & operation complexity is considered cheaper and less risky.
The following details are not covered in this AIP and probably have their own AIPs.
- The list of supported OIDC providers.
- The actual Open ID configuration of each OIDC provider.
- The actual JWK operation of each OIDC provider.
- The actual impact and recommended operations for operators.
As explained in the summary, AIP-61: keyless accounts re quires validators agree on the latest JWKs of the OIDC providers. This AIP proposes a solution of the functional goal that balances all non-functional needs:
- the OIDC provider set is assumed to be maintained in an on-chain resource;
- validators take the OIDC provider set as an input and for each provider, monitor and publish the quorum-certified JWK updates on chain;
- relatively small and practical security assumptions on the node environment are introduced;
- replication latency of a few seconds can be achieved;
- there is no requirement on the OIDC provider (other than they are OIDC-compliant);
- DevOps complexity is relatively low.
Validator operators need to ensure their validator nodes live in a environment that allow the nodes to securely access OIDC providers' APIs.
This configuration needs to happen every time we update the supported OIDC provider list or any of their OpenID configuration URL. But hopefully each operation should be trivial.
One alternative is to run a separate trusted oracle to watch JWKs (or a separate trusted oracle network to watch JWks and quorum-certify updates) and publish updates on chain through regular Aptos transactions.
Reasons to not consider: security and decentralization + DevOps complexity. This solution introduces a new type of node/network that needs to be trusted by the validator network, which are extra security assumptions and operation complexity.
Another alternative: once a JWK update is detected off-chain in any way, publish using a governance proposal and wait to be voted by the majority of the stake.
Reasons to not consider: latency. The voting can last up to 7 days (by the current setting), which means the new JWKs may not be available on chain until they are 7 days old, which further means the keyless transactions signed by the new key won't be accepted for 7 days.
One alternative is to have OIDC providers each create their own Aptos account and publish their new keys on-chain whenever they publish them at its own JWK URL.
Reasons to not consider: security + provider-independence. This alternative requires that OIDC providers are sufficiently incentivized to participate in Aptos protocol, which is not the case today and it is unclear how to ensure that. Even if the providers are incentivized, any of their Aptos account keys/JWKs effectively become a new "single point of compromise" for the whole system: if compromised, all associated keyless accounts are compromised. This is some new (and likely undesired) security assumption to make. Additionlly, existing OIDC providers will be required to update their key rotation logic.
Alternative 4: let validators fetch JWKs during execution and discard transaction if results are split
One alternative is to fetch JWKs as part of the trasaction verification and discard the transaction if the verification results are split.
Reasons to not consider: DevOps complexity. This requires refactoring of some fundamental building blocks of Aptos (e.g., decoupled execution), which might be too much implementation/testing complexity.
An OIDC provider may rotate its JWKs in any frequency. But as long as it eventually stays on a version for long enough, and this version can be observed by a subset of honest validators that hold more than 2/3 of the total voting power, this version can be successfully replicated on chain.
This assumption is valid for most OIDC providers today. (The fastest rotator known is Google, who rotates their JWK nearly every week.)
Here are some examples when JWK consensus may get stuck on a provider.
- When validators have evenly split views on the JWKs of this provider.
- When the provider rotates too fast.
- When 1/3 of the total voting power is controlled by malicious validator, and at least one honest validator does not have access to the provider's JWK API.
SupportedOIDCProviders
: a map from an OIDC provider (e.g.,https://accounts.google.com
) to watch, to its OpenID configuration URL.ObservedJWKs
: a map from a provider to a pair of(jwk_set, version)
that validators observed and agreed on.- A feature flag to turn on/off the feature.
ObservedJWKsUpdated
- Can contain a copy of
ObservedJWKs
. - Should be emitted when
ObservedJWKs
is updated by a transaction. - Validators should subscribe to reset its in-mem state accordingly.
- Can contain a copy of
- Turn on/off the feature.
- Start/stop watching an OIDC provider (i.e., update
SupportedOIDCProviders
). - Delete all observed JWKs of an OIDC provider.
- Typically used together with "stop watching an OIDC provider" to fully disable a provider (from the perspective of keyless accounts). This can make validator-side logic simpler.
- Patch the on-chain JWK collection.
- Example use case: a single key is found compromised but the provider has not yet rotate it.
A validator, if a member of the validator set of the current epoch, should have the following states for every provider
in the on-chain SupportedOIDCProviders
.
latest_on_chain: LastestOnChainValue { jwks, version }
: should be maintained to be the latest value ofprovider
in on-chainObservedJWKs
and used as the baseline for detecting JWK updates.consensus_state: PerProviderConsensusState
: should capture the per-provider JWK consensus state. Below are the variants.NotStarted
: waiting for a new version of JWKs to be observed.InProgress{ob_jwks, version, task_handle}
: a new version of JWKs was observed (value beingob
), running the certifying process with peers in a parallel tasktask_handle
.Finished{ob_jwks, version}
: update(ob_jwks, version)
was certified and proposed as a transaction to update the on-chainObservedJWKs
.
A validator, if a member of the validator set of the current epoch, should do the following for each provider
in the on-chain SupportedOIDCProviders
, a validator
- On a new epoch:
- start periodically fetching the latest JWKs from
provider
's OpenID configuration; - load
ObservedJWKs
and initializelatest_on_chain
with the according value of keyprovider
. - initialize
consensus_state
toNotStarted
.
- start periodically fetching the latest JWKs from
- On a fetch result
ob_jwks
, ifob_jwks != latest_on_chain.jwks
:- Start an async task
task_handle
to run the quorum-certifying process: broadcast to request every peer to sign its observed(ob_jwks, version)
ofprovider
and collect enough responses(ob_jwks, version, sig)
that matches the local(ob_jwks, latest_on_chain.version + 1)
to form a quorum-cert (a multi-signature of more than 2/3 of the epoch total voting power); - switch
consensus_state
toInProgress(ob, latest_on_chain.version + 1, task_handle)
.
- Start an async task
- Once a quorum-cert is formed for
(ob_jwks, version)
, if the state isInProgress{ob_jwks, version, _}
:- propose a transaction to update the on-chain
ObservedJWKs
with the observed and quorum-certified(ob_jwks, version)
ofprovider
.- can leverage validator transactions.
- switch
consensus_state
toFinished{ob_jwks, version}
.
- propose a transaction to update the on-chain
- On an
ObservedJWKsUpdated
event where the value ofprovider
is different fromlatest_on_chain
:- reinitialize
latest_on_chain
with the new on-chain values; - switch
consensus_state
toNotStarted
.
- reinitialize
- Whenever switching
consensus_state
away fromInProgress(ob, version, task_handle)
, abort the certifying processtask_handle
. - On a certifying request from a peer, if
consensus_state
is eitherInProgress{ob_jwks, version, task_handle}
orFinished{ob_jwks, version}
, sign(ob_jwks, version)
and respond with(ob_jwks, version, sig)
wheresig
is the signature.
A JWK update transaction should take the following parameters:
epoch
: which epoch this update is generated and certified.provider
: which provider's JWKs are being updated.ob_jwks
: the new value of JWKs.version
: the new version number that is supposed to be the old version number + 1.multi_sig
: the set of signers that signed the update(ob_jwks, version)
and their signatures aggreagted.
The execution should do the following check before updating the map ObservedJWKs
.
epoch
is current.version == on_chain_version + 1
, whereon_chain_version
is the one currently associated withprovider
inObservedJWKs
, or 0 otherwise.- The signer set in
multi_sig
is a valid sub-set of the current validator set, and the voting power of the sub-set is more than 2/3 of the total voting power. - The
multi_sig
itself is a valid signature of the update(ob_jwks, version)
by the signer set.
Validators do not need to persist any off-chain states.
Some new assumptions on the node environment are introduced. See the security section for more details.
Actions are required on operators' end, but hopefully they are trivial.
The purpose of this AIP is to support the implementation of AIP-61: keyless accounts.
However, this AIP paves the way towards native support for HTTP oracles in Move. Specifically, the JWK consensus infrastructure could be used to empower Move contracts to asynchronously fetch the contents of an HTTP page. This could open up many applications, such as price oracles, native bridges, prediction markets, and more.
The implementation and testing should be completed by the end of Feb 2024.
This is primarily a validator change + new on-chain states, no SDK/API/CLI/Indexer support is required.
Release 1.10
Below, we suppose the following 2 fundamental assumptions are valid and only discuss the assumtpions required for JWK consensus to work.
- The assumptions for blockchain main consensus.
- OIDC provider itself (mainly, its JWKs and HTTPS private keys) are secure.
- Compromised OIDC providers are discussed in AIP-61: keyless accounts.
The proposed solution should work if more than 2/3 of the total validator voting power is staked in nodes that run in a secure environment for accessing the provider's APIs (typically in HTTPS), where a secure environment for the provider means that:
- the DNS servers being used to resolve the domain name of the provider's APIs (and the connection to them) are secure;
- the CA certs used to validate the HTTPS connections are secure.
More implications:
- If no more than 2/3 of the total validator voting power is held in a secure environment for the provider, new keys of the provider cannot be published on chain. (From the perspective of keyless accounts, it is a liveness issue.)
- If more than 2/3 of the total validator voting power is held in a malicious environment for the provider (where DNS server and CA certs are both controlled by an attacker), malicious JWKs can be published on chain. (From the perspective of keyless accounts, it means any user account associated with this provider can be corrupted.)
Some realistic scenarios where the assumptions may be invalid.
- Extremely imbalanced stake distribution. If a large amount of stake goes to a small set of validator nodes, it is in general easier to compromise.
- Geo-restriction. If too much stake goes to a validator node that runs in a region where the environment is blocking the communication to the provider API(or even manipulating), then JWK consensus can stall (or even let malicious JWKs through).
- DNS server/package manager as a potential single point of compromise. Even without imbalanced stake distribution or geo-restriction, validator nodes are still very likely to share the same DNS server/package manager configurations, leaving the DNS server/package manager as the potential single point of compromise.
Because the solution can be implemented in a per-provider manner, the assumption for supporting multiple providers can be simply the combination of the assumption for each individual provider. The only caveat is that 2 providers may be geo-restricted in different ways (e.g., provider 1 is blocked in region 2 and provider 2 is block in region 1) and thus can not be supported simultaneiously. In that case, we may want to switch to per-provider oracle networks.