Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trust-Quorum: Messages, Config, Crypto #7859

Open
wants to merge 9 commits into
base: ajs/shamir
Choose a base branch
from

Conversation

andrewjstone
Copy link
Contributor

This is the initial commit of the "real" trust quorum code as described in RFD 238. This commit provides some of the foundation needed to implement the trust quorum reconfiguration protocol.

The protocol itself will be implemented in a Node type in a sans-io style with property based tests for the full protocol, similar to LRTQ. Async code will utilize the protocol and forward messages over sprockets channels, and handle requests from Nexus.

This initial code was split out to keep the PR small, although it has been used in some preliminary protocol code already that will come in a follow up PR.

This is the initial commit of the "real" trust quorum code as described
in RFD 238. This commit provides some of the foundation needed to
implement the trust quorum reconfiguration protocol.

The protocol itself will be implemented in a `Node` type in a
[sans-io](https://sans-io.readthedocs.io/) style with property based
tests for the full protocol, similar to LRTQ. Async code will utilize
the protocol and forward messages over sprockets channels, and handle
requests from Nexus.

This initial code was split out to keep the PR small, although
it has been used in some preliminary protocol code already that will
come in a follow up PR.
@andrewjstone andrewjstone requested a review from inickles March 21, 2025 23:26
@hawkw hawkw assigned hawkw and unassigned hawkw Mar 24, 2025
@hawkw hawkw self-requested a review March 24, 2025 20:19
Comment on lines +90 to +91
pub part_number: String,
pub serial_number: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking ahead a little, this is a sample cert chain

[INFO  verifier_ipcc] Certificate => CN=alias,O=Oxide Computer Company,C=US
[INFO  verifier_ipcc] Certificate => CN=device-id,O=Oxide Computer Company,C=US
[INFO  verifier_ipcc] Certificate => CN=PDV1:913-0000019:006:BRM42220062,O=Oxide Computer Company,C=US
[INFO  verifier_ipcc] Certificate => CN=Platform Identity Staging Intermediate 20780377,O=Oxide Computer Company,C=US

I'm guessing it makes sense for sprockets or another library to be responsible for splitting up the PDV1:913-0000019:006:BRM42220062 vs adding that logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think sprockets should split that and return a PlatformId. Then trust-quorum should probably use the sprockets type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, some shared library used by both sprockets and trust-quorum can implement PlatformId

Copy link

@inickles inickles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it an initial pass. I didn't see anything that I think needs changing, just some questions and comments.

Comment on lines +90 to +91
pub part_number: String,
pub serial_number: String,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it make sense to not have these field be public and to have a constructor that takes platform ID cert that it validates? The idea being it would be harder to accidentally create one of these that is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipate having some sort of From implementation or using a type provided by sprockets here in the future. However, I don't think it's harmful to provide access to the fields. I'll consider in a follow up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the potential harm is just that it makes for a way to not use that From impl to construct one of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a fair assessment. I actually expect this whole type to go away and come from elsewhere. I can make the fields non-pub and return &str in the follow up PR I'm working on now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change in #7929. I still think eventually, the PlatformId will come from elsewhere, but this at least makes it a touch less harmful to use.

This change removes the requirement to carry encrypted rack shares in
every `Prepare` message. gfss can generate them given enough encrypted
shares because its interpolate method is more generic than other
implementations and doesn't only calculate the secret.

Also fix up based on some other review comments. This removes the
`Error` type altogether as it's no longer used.
@andrewjstone andrewjstone changed the base branch from main to ajs/shamir April 2, 2025 16:53
@andrewjstone
Copy link
Contributor Author

This PR now builds upon #7891. We'll have to change the target to main when ready to merge.

andrewjstone added a commit that referenced this pull request Apr 5, 2025
This code builds upon #7859, which itself builds upon #7891. Those
need to be merged in order first.

A `Node` is the sans-io entity driving the trust quorum protocol. This
PR starts the work of creating the `Node` type and coordinating an
initial configuration.

There's a property based test that generates an initial `ReconfigureMsg`
that is then used to call `Node::coordinate_reconfiguration`, which will
internally setup a `CoordinatorState` and create a bunch of messages
to be sent. We verify that a new `PersistentState` is returned and that
messages are sent.

This needs a bit more documentation and testing, so it's still WIP.
andrewjstone added a commit that referenced this pull request Apr 5, 2025
This code builds upon #7859, which itself builds upon #7891. Those
need to be merged in order first.

A `Node` is the sans-io entity driving the trust quorum protocol. This
PR starts the work of creating the `Node` type and coordinating an
initial configuration.

There's a property based test that generates an initial `ReconfigureMsg`
that is then used to call `Node::coordinate_reconfiguration`, which will
internally setup a `CoordinatorState` and create a bunch of messages
to be sent. We verify that a new `PersistentState` is returned and that
messages are sent.

This needs a bit more documentation and testing, so it's still WIP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants