-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement resharding and sharded shuffle based on it. #1014
Conversation
This change add a generic `reshard` functionality that allows shards to redistribute their shares according to some logic based on the share value, index in the input or the context that provides access to PRSS. Most commonly, the redistribution logic is either deterministic - send all shares to the shard 0 or based on PRSS sampling. Sharded shuffle uses the latter, the future sharded attribution will make use of deterministic reshard. To prove that resharding works, the sharded shuffle protocol was implemented in this change as well. It is basically the same [protocol](https://private-user-images.githubusercontent.com/230930/278093571-5757ba9e-c2ae-4a2b-8ce9-3065291749e2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTMyOTQ1MjYsIm5iZiI6MTcxMzI5NDIyNiwicGF0aCI6Ii8yMzA5MzAvMjc4MDkzNTcxLTU3NTdiYTllLWMyYWUtNGEyYi04Y2U5LTMwNjUyOTE3NDllMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNDE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDQxNlQxOTAzNDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wMmU0N2YxYmEzMmRkZjMwMzdiMTlhMzIyM2RkMzg3ZGE1MTY0N2U5NzNkMzQ2OTgxOTQyNDA0M2MxMWU1ZDRkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.UB6IZf7Be9iu4CASw-Sm_DrqS8A-guSJnUHG312HKEE) as we implemented back in October with one caveat - shards on H1 do not know the cardinality of `C` and they can't set their shares without knowing it. The protocol was amended to account for that. It was decided (for no particular reason) that H2 shards will inform H1 about $|C|$.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1014 +/- ##
==========================================
+ Coverage 90.06% 90.25% +0.19%
==========================================
Files 171 172 +1
Lines 25157 25727 +570
==========================================
+ Hits 22658 23221 +563
- Misses 2499 2506 +7 ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,522 @@ | |||
#![allow(dead_code)] // until sharded shuffle is used in OPRF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielmasny please take a look at the protocol and see if it makes sense to you
ipa-core/src/protocol/context/mod.rs
Outdated
/// closed, even if nothing has been communicated between that pair. | ||
/// | ||
/// ## Panics | ||
/// It does not panic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would panic if shard_picker
returned an out-of-range shard index, which seems like it is possible given that ShardIndex
allows construction from arbitrary integers.
(This is admittedly a nitpick -- perhaps a more salient question is whether the clippy lints for error / panic docs are worthwhile.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is annoying sometimes but it could be useful to see all places where things may panic, often w/o intention to have it
/// | ||
/// ## Errors | ||
/// If cross-shard communication fails | ||
pub async fn reshard<L, K, C, S>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why this ended up in the context module?
I worry a bit that shard location could leak information, which will depend on how this function is used. Gluing it together with the local shuffle (which is the only way it is used currently), might mitigate that risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I foresee this function being used in OPRF and other parts where we need shards to redistribute the shares. I don't see anything that cannot be otherwise done manually by opening a shard channel and sending data there.
I also worry about timing attacks as intra-helper communication is now exposed and I was thinking that we need to add some protection on network layer - not sure if anything can be done in protocol code to prevent that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shuffle makes sense to me and looks pretty clean. Thanks! I have a couple of questions, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a beautiful PR. I love it. I only have one major concern: are we improperly using PRSS, specifically are the shards all re-using the same PRSS for different things. If that's the case, it seems like a potential security issue.
/// When `shard_picker` returns an out-of-bounds index. | ||
/// | ||
/// ## Errors | ||
/// If cross-shard communication fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are N^2 channels, it seems like the probability of an error might get pretty high. I assume there is some kind of retry logic automatically build into the communication layer to mitigate sporadic failures, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we don't have retry mechanisms built-in at the application layer right now and we rely on transport layer (TCP) to provide reliable channels. We may build something for that purpose if we see that TCP does not satisfy our needs, but it shouldn't be visible to MPC because there isn't anything you can do here that you can't do at the infrastructure layer.
ipa-core/src/protocol/context/mod.rs
Outdated
|
||
// Open communication channels to all shards on this helper and keep track of records sent | ||
// through any of them. | ||
let mut sending_ends = ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this name. What do you mean by "ends"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SendingEnd
is the type returned by shard_send_channel
. You can think of it as "transmit handle" (in the sense of let (tx, rx) = mpsc::channel()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for every channel there is always at least one sender and one receiver. Sending end of a channel is what we give to the sender(s) and receiving end is owned by receiver(s).
To avoid confusion here, let me rename it to send_channels
.send(*record_id, val) | ||
.await | ||
.map_err(crate::error::Error::from) | ||
.map(|()| None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of send operation is a unit ()
and we need to map it to Option<Value>
to conform to receive API. The goal is to handle "receive from other shards" and "receive from this shard" operations in a uniform way (line 518)
I: IntoIterator<Item = S>, | ||
I::IntoIter: Send + ExactSizeIterator, | ||
C: ShardedContext, | ||
S: Shuffleable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elegant! These are great trait bounds. Only 4 of them, the names make sense... great work!
// Process more data as it comes in, or close the sending channels, if there is nothing | ||
// left. | ||
if let Some(((i, val), ctx)) = input.next() { | ||
let dest_shard = shard_picker(ctx, RecordId::from(i), &val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use-case for when the value itself is used by the shard picker? Is this for the OPRF part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be one use case - for OPRF resharding we will use some bits of the value itself to determine a destination shard.
ipa-core/src/protocol/context/mod.rs
Outdated
if dest_shard == my_shard { | ||
Some(((my_shard, Ok(Some(val.clone()))), (input, shard_ends))) | ||
} else { | ||
let (record_id, se) = shard_ends.get_mut(&dest_shard).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the variable name se
. What does it mean / stand for? Is this a term of art I'm just unfamiliar with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too abbreviated, I agree. fixed it
fn pick_shard(&self, record_id: RecordId, direction: Direction) -> ShardIndex { | ||
// FIXME: update PRSS trait to compute only left or right part | ||
let (l, r): (u128, u128) = self.prss().generate(record_id); | ||
let shard_index = u32::try_from( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this cause every single shard to select the same destination shard index for its first record? I think they're all narrowed to the same step, and all start from RecordId::FIRST and count up. If this is the case, it feels like a security issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this below, to summarize - it shouldn't be a security issue because all shards run MPC circuits independent from each other. Each circuit runs an independent Diffie-Hellman protocol between each pair of helpers, so the probability of them negotiating the same shared secret is
ipa-core/src/protocol/context/mod.rs
Outdated
use futures::Stream; | ||
use futures_util::stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use futures::Stream; | |
use futures_util::stream; | |
use futures::{Stream, StreamExt, stream}; |
(and delete the import on line 13)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my IDE keeps failing me on this one. It always prefers futures_util
over futures
- I don't know why
/// The destination shard for each masked row is decided based on value obtained from sampling | ||
/// PRSS. Which value to use (left or right) is decided based on `direction` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The destination shard for each masked row is decided based on value obtained from sampling | |
/// PRSS. Which value to use (left or right) is decided based on `direction` parameter. | |
/// The mask value, and destination shard for each masked row, are determined by | |
/// sampling PRSS. `Direction::Left` means to use the PRSS shared with the left | |
/// helper, and `Direction::Right` means to use the PRSS shared with the right | |
/// helper. |
This is similar to a suggestion Ben left elsewhere.
Since this code (more precisely, the h#_shuffle routines that call this one) relies on the fact that "left" means "peer Role::peer
referencing this protocol, but the static assertion seems better unless it can't be written against public APIs or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep added set of static checks in Role
module
ipa-core/src/protocol/context/mod.rs
Outdated
|
||
/// Picks a shard according to the value obtained from sampling PRSS shared with the given helper. | ||
fn pick_shard(&self, record_id: RecordId, direction: Direction) -> ShardIndex { | ||
// FIXME: update PRSS trait to compute only left or right part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial reaction was that this isn't that hard (maybe something like impl FromPrss for Left<T>
), but upon going to try and mock it up, I realized it's a bit harder than that because it needs to propagate through a few layers of functions in the PRSS implementation.
Since the PRSS stuff needs to be monomorphized anyways, maybe the compiler will figure out that it can skip half of the PRSS generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my worry is that as you said there are quite a few layers before we get to AES, so compiler may not have enough memory to keep track of things and/or time to do that. Cutting of 50% of CPU time to generate these masks seems to be important enough to have some guarantee of it happening
ipa-core/src/protocol/context/mod.rs
Outdated
|
||
// Open communication channels to all shards on this helper and keep track of records sent | ||
// through any of them. | ||
let mut sending_ends = ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SendingEnd
is the type returned by shard_send_channel
. You can think of it as "transmit handle" (in the sense of let (tx, rx) = mpsc::channel()
.
} | ||
} | ||
|
||
impl<C: Context + ShardConfiguration> ShardedContext for C {} | ||
|
||
impl ShardConfiguration for Base<'_, Sharded> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't directly about this PR, but if we're settling on a "protocols should be written against upgraded contexts" policy (see discussion in #1021), then maybe ShardConfiguration
shouldn't be implemented for the base context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged this PR and then I saw this comment - for some reason it wasn't showing up on the "Changes" tab. I could be wrong but I think we don't expose Base
context directly - it is always wrapped into a semi-honest or malicious version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was mixing things up when I made this comment. By "base context" I meant semi_honest::Context
, which is not the same as struct Base
.
This change add a generic
reshard
functionality that allows shards to redistribute their shares according to some logic based on the share value, index in the input or the context that provides access to PRSS.Most commonly, the redistribution logic is either deterministic - send all shares to the shard 0 or based on PRSS sampling. Sharded shuffle uses the latter, the future sharded attribution will make use of deterministic reshard.
To prove that resharding works, the sharded shuffle protocol was implemented in this change as well. It is basically the same protocol as we implemented back in October (#816) with one caveat - shards on H1 do not know the cardinality of
C
and they can't set their shares without knowing it.The protocol was amended to account for that. It was decided (for no particular reason) that H2 shards will inform H1 about$|C|$ .