Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TrafficControl] Support allowlisting #19242

Merged
merged 3 commits into from
Sep 25, 2024
Merged

Conversation

williampsmith
Copy link
Contributor

@williampsmith williampsmith commented Sep 6, 2024

Description

In some cases, we may want to enable a more restrictive policy wherein the node must explicitly specify all IP's from which it will accept requests. Because TrafficController policies do not easily support allowlisting, this is instead supported by introducing a separate mode of operation for traffic controller, enabled by providing allow_list field in the PolicyConfig, which should map to a list of strings all parseable to IpAddr.

When this config is present, we skip spawning a tally thread and ignore all calls to TrafficController::tally, and instead initialize an in memory allowlist of IPs. On subsequent calls to TrafficController::check, we check this list against the requestor IP.

Dry run mode in this mode still works as expected, as do block metrics (any request that is not in the allowlist is tallied against the block metric).

Test plan

Added simtest


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Sep 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 9, 2024 9:53pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 9:53pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 9:53pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 9:53pm

Copy link
Contributor

@longbowlu longbowlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm with a couple of questions/nits

.ok()
.map(|socket_addr| socket_addr.ip())
.or_else(|| {
error!("Failed to parse value of {:?} to ip address or socket.", ip,);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error!("Failed to parse value of {:?} to ip address or socket.", ip,);
error!("Failed to parse value of {:?} to ip address and socket.", ip);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is correct. The parsing can take either an ip address or a socket object (ip + port). Will fix the trailing comma in the other PR since it's minor and want to get this landed


match &self.acl {
Acl::Allowlist(allowlist) => {
let allowed = client.is_none() || allowlist.contains(&client.unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be client be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the client should never be None in the happy path. This is just edge case handling for the case where the client parsing fails earlier in the flow. In such a case, we don't want to panic, since there have been some interesting cases in the past where this has failed in production after rollout, so we just set the client to None and essentially allow everything. There's logging and a metric that we bump (again, earlier in the flow) if that happens though so that we can investigate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder when it's None and it's in allowlist mode whether we should reject it as opposed to let it pass. Will leave it up to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a way that this can be made None by an attacker. It would generally mean that we have a bug in parsing, or traffic controller on the server itself is misconfigured. So it feels safer in this case to allow. If it's a bug, then we definitely don't want to be sinking all requests on the node, and if it's a misconfigured server, then that should be isolated to just that one node, so it should be safe from the perspective of the bridge network to just allow it. The worst case scenario if we reject in this case is that it's a bug that affects all nodes, and we effectively halt the bridge.

@longbowlu
Copy link
Contributor

longbowlu commented Sep 12, 2024

and @williampsmith just to make sure we are on the same page - as your summarized in that slack channel, we will use key based allowlist v.s. ip based (what this PR is about). But the latter will be useful so we will add it any way. Is this correct?

@williampsmith
Copy link
Contributor Author

and @williampsmith just to make sure we are on the same page - as your summarized in that slack channel, we will use key based allowlist v.s. ip based (what this PR is about). But the latter will be useful so we will add it any way. Is this correct?

Yes, we'll be using key based allowlist. The integration here is not concerned with that, and the generalization to support that will happen in a followup PR

@williampsmith williampsmith merged commit c76e196 into main Sep 25, 2024
50 of 51 checks passed
@williampsmith williampsmith deleted the william/allowlist-policy branch September 25, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants