Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement resharding and sharded shuffle based on it. #1014

Merged
merged 9 commits into from
May 1, 2024

Conversation

akoshelev
Copy link
Collaborator

@akoshelev akoshelev commented Apr 16, 2024

This change add a generic reshard functionality that allows shards to redistribute their shares according to some logic based on the share value, index in the input or the context that provides access to PRSS.

Most commonly, the redistribution logic is either deterministic - send all shares to the shard 0 or based on PRSS sampling. Sharded shuffle uses the latter, the future sharded attribution will make use of deterministic reshard.

To prove that resharding works, the sharded shuffle protocol was implemented in this change as well. It is basically the same protocol as we implemented back in October (#816) with one caveat - shards on H1 do not know the cardinality of C and they can't set their shares without knowing it.

The protocol was amended to account for that. It was decided (for no particular reason) that H2 shards will inform H1 about $|C|$.

This change add a generic `reshard` functionality that allows shards to redistribute their shares according to some logic based on the share value, index in the input or the context that provides access to PRSS.

Most commonly, the redistribution logic is either deterministic - send all shares to the shard 0 or based on PRSS sampling. Sharded shuffle uses the latter, the future sharded attribution will make use of deterministic reshard.

To prove that resharding works, the sharded shuffle protocol was implemented in this change as well. It is basically the same [protocol](https://private-user-images.githubusercontent.com/230930/278093571-5757ba9e-c2ae-4a2b-8ce9-3065291749e2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTMyOTQ1MjYsIm5iZiI6MTcxMzI5NDIyNiwicGF0aCI6Ii8yMzA5MzAvMjc4MDkzNTcxLTU3NTdiYTllLWMyYWUtNGEyYi04Y2U5LTMwNjUyOTE3NDllMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNDE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDQxNlQxOTAzNDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wMmU0N2YxYmEzMmRkZjMwMzdiMTlhMzIyM2RkMzg3ZGE1MTY0N2U5NzNkMzQ2OTgxOTQyNDA0M2MxMWU1ZDRkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.UB6IZf7Be9iu4CASw-Sm_DrqS8A-guSJnUHG312HKEE) as we implemented back in October with one caveat - shards on H1 do not know the cardinality of `C` and they can't set their shares without knowing it.

The protocol was amended to account for that. It was decided (for no particular reason) that H2 shards will inform H1 about $|C|$.
Copy link

codecov bot commented Apr 16, 2024

Codecov Report

Attention: Patch coverage is 96.31491% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 90.25%. Comparing base (e1bc038) to head (83fb57b).

Files Patch % Lines
ipa-core/src/protocol/ipa_prf/shuffle/sharded.rs 98.23% 6 Missing ⚠️
ipa-core/src/helpers/buffers/unordered_receiver.rs 54.54% 5 Missing ⚠️
ipa-core/src/helpers/gateway/receive.rs 55.55% 4 Missing ⚠️
ipa-core/src/protocol/context/mod.rs 97.94% 4 Missing ⚠️
ipa-core/src/helpers/mod.rs 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1014      +/-   ##
==========================================
+ Coverage   90.06%   90.25%   +0.19%     
==========================================
  Files         171      172       +1     
  Lines       25157    25727     +570     
==========================================
+ Hits        22658    23221     +563     
- Misses       2499     2506       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -0,0 +1,522 @@
#![allow(dead_code)] // until sharded shuffle is used in OPRF
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmasny please take a look at the protocol and see if it makes sense to you

ipa-core/src/helpers/gateway/receive.rs Outdated Show resolved Hide resolved
ipa-core/src/helpers/gateway/send.rs Outdated Show resolved Hide resolved
/// closed, even if nothing has been communicated between that pair.
///
/// ## Panics
/// It does not panic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would panic if shard_picker returned an out-of-range shard index, which seems like it is possible given that ShardIndex allows construction from arbitrary integers.

(This is admittedly a nitpick -- perhaps a more salient question is whether the clippy lints for error / panic docs are worthwhile.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is annoying sometimes but it could be useful to see all places where things may panic, often w/o intention to have it

///
/// ## Errors
/// If cross-shard communication fails
pub async fn reshard<L, K, C, S>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why this ended up in the context module?

I worry a bit that shard location could leak information, which will depend on how this function is used. Gluing it together with the local shuffle (which is the only way it is used currently), might mitigate that risk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I foresee this function being used in OPRF and other parts where we need shards to redistribute the shares. I don't see anything that cannot be otherwise done manually by opening a shard channel and sending data there.

I also worry about timing attacks as intra-helper communication is now exposed and I was thinking that we need to add some protection on network layer - not sure if anything can be done in protocol code to prevent that

ipa-core/src/protocol/context/mod.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@danielmasny danielmasny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shuffle makes sense to me and looks pretty clean. Thanks! I have a couple of questions, see below.

ipa-core/src/protocol/context/mod.rs Outdated Show resolved Hide resolved
ipa-core/src/protocol/context/mod.rs Outdated Show resolved Hide resolved
ipa-core/src/protocol/context/mod.rs Show resolved Hide resolved
ipa-core/src/protocol/ipa_prf/shuffle/sharded.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@benjaminsavage benjaminsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a beautiful PR. I love it. I only have one major concern: are we improperly using PRSS, specifically are the shards all re-using the same PRSS for different things. If that's the case, it seems like a potential security issue.

ipa-core/src/protocol/context/mod.rs Outdated Show resolved Hide resolved
/// When `shard_picker` returns an out-of-bounds index.
///
/// ## Errors
/// If cross-shard communication fails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are N^2 channels, it seems like the probability of an error might get pretty high. I assume there is some kind of retry logic automatically build into the communication layer to mitigate sporadic failures, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we don't have retry mechanisms built-in at the application layer right now and we rely on transport layer (TCP) to provide reliable channels. We may build something for that purpose if we see that TCP does not satisfy our needs, but it shouldn't be visible to MPC because there isn't anything you can do here that you can't do at the infrastructure layer.

ipa-core/src/protocol/context/mod.rs Outdated Show resolved Hide resolved

// Open communication channels to all shards on this helper and keep track of records sent
// through any of them.
let mut sending_ends = ctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this name. What do you mean by "ends"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendingEnd is the type returned by shard_send_channel. You can think of it as "transmit handle" (in the sense of let (tx, rx) = mpsc::channel().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for every channel there is always at least one sender and one receiver. Sending end of a channel is what we give to the sender(s) and receiving end is owned by receiver(s).

To avoid confusion here, let me rename it to send_channels

.send(*record_id, val)
.await
.map_err(crate::error::Error::from)
.map(|()| None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of send operation is a unit () and we need to map it to Option<Value> to conform to receive API. The goal is to handle "receive from other shards" and "receive from this shard" operations in a uniform way (line 518)

Comment on lines +433 to +436
I: IntoIterator<Item = S>,
I::IntoIter: Send + ExactSizeIterator,
C: ShardedContext,
S: Shuffleable,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elegant! These are great trait bounds. Only 4 of them, the names make sense... great work!

// Process more data as it comes in, or close the sending channels, if there is nothing
// left.
if let Some(((i, val), ctx)) = input.next() {
let dest_shard = shard_picker(ctx, RecordId::from(i), &val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for when the value itself is used by the shard picker? Is this for the OPRF part?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be one use case - for OPRF resharding we will use some bits of the value itself to determine a destination shard.

if dest_shard == my_shard {
Some(((my_shard, Ok(Some(val.clone()))), (input, shard_ends)))
} else {
let (record_id, se) = shard_ends.get_mut(&dest_shard).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the variable name se. What does it mean / stand for? Is this a term of art I'm just unfamiliar with?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too abbreviated, I agree. fixed it

fn pick_shard(&self, record_id: RecordId, direction: Direction) -> ShardIndex {
// FIXME: update PRSS trait to compute only left or right part
let (l, r): (u128, u128) = self.prss().generate(record_id);
let shard_index = u32::try_from(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause every single shard to select the same destination shard index for its first record? I think they're all narrowed to the same step, and all start from RecordId::FIRST and count up. If this is the case, it feels like a security issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this below, to summarize - it shouldn't be a security issue because all shards run MPC circuits independent from each other. Each circuit runs an independent Diffie-Hellman protocol between each pair of helpers, so the probability of them negotiating the same shared secret is $\frac{N}{2^{256}}$ where $N$ is the total number of shards

Comment on lines 22 to 23
use futures::Stream;
use futures_util::stream;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use futures::Stream;
use futures_util::stream;
use futures::{Stream, StreamExt, stream};

(and delete the import on line 13)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my IDE keeps failing me on this one. It always prefers futures_util over futures - I don't know why

Comment on lines +68 to +69
/// The destination shard for each masked row is decided based on value obtained from sampling
/// PRSS. Which value to use (left or right) is decided based on `direction` parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The destination shard for each masked row is decided based on value obtained from sampling
/// PRSS. Which value to use (left or right) is decided based on `direction` parameter.
/// The mask value, and destination shard for each masked row, are determined by
/// sampling PRSS. `Direction::Left` means to use the PRSS shared with the left
/// helper, and `Direction::Right` means to use the PRSS shared with the right
/// helper.

This is similar to a suggestion Ben left elsewhere.

Since this code (more precisely, the h#_shuffle routines that call this one) relies on the fact that "left" means "peer $i - 1$" and "right" means "peer $i + 1$", maybe it's worth adding a static assertion to check that? Or alternatively, we could put a comment on Role::peer referencing this protocol, but the static assertion seems better unless it can't be written against public APIs or something like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep added set of static checks in Role module


/// Picks a shard according to the value obtained from sampling PRSS shared with the given helper.
fn pick_shard(&self, record_id: RecordId, direction: Direction) -> ShardIndex {
// FIXME: update PRSS trait to compute only left or right part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction was that this isn't that hard (maybe something like impl FromPrss for Left<T>), but upon going to try and mock it up, I realized it's a bit harder than that because it needs to propagate through a few layers of functions in the PRSS implementation.

Since the PRSS stuff needs to be monomorphized anyways, maybe the compiler will figure out that it can skip half of the PRSS generation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my worry is that as you said there are quite a few layers before we get to AES, so compiler may not have enough memory to keep track of things and/or time to do that. Cutting of 50% of CPU time to generate these masks seems to be important enough to have some guarantee of it happening


// Open communication channels to all shards on this helper and keep track of records sent
// through any of them.
let mut sending_ends = ctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendingEnd is the type returned by shard_send_channel. You can think of it as "transmit handle" (in the sense of let (tx, rx) = mpsc::channel().

}
}

impl<C: Context + ShardConfiguration> ShardedContext for C {}

impl ShardConfiguration for Base<'_, Sharded> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't directly about this PR, but if we're settling on a "protocols should be written against upgraded contexts" policy (see discussion in #1021), then maybe ShardConfiguration shouldn't be implemented for the base context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged this PR and then I saw this comment - for some reason it wasn't showing up on the "Changes" tab. I could be wrong but I think we don't expose Base context directly - it is always wrapped into a semi-honest or malicious version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was mixing things up when I made this comment. By "base context" I meant semi_honest::Context, which is not the same as struct Base.

@akoshelev akoshelev merged commit 762393b into private-attribution:main May 1, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants