Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oprf][shuffle] OPRF Shuffle using a 2-round 4-message shuffle protocol #816

Merged
merged 14 commits into from
Nov 7, 2023

Conversation

cryo28
Copy link
Contributor

@cryo28 cryo28 commented Oct 25, 2023

  1. Implementation of the protocol (not sharded)
  2. New query type
  3. Record structs and implt for the new query type row: probably not needed or should be changed. I had to implement them without knowing what they should actually be, in order to implement a new query type.

screenshot_20231018_140210

1. New query type
2. Implementation of the protocol (not sharded)
@@ -0,0 +1,403 @@
use std::ops::{Add, AddAssign};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Daniel pointed this out, but this has nothing to do with the OPRF. It should replace our existing shuffle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be the shuffle that we use in both approaches. Alex and I discussed this. Alex seemed to prefer that we still put it in an OPRF related module even when it is a basic protocol (this is also the case for operations on Boolean shares). The main reason is that we have a more clear separation between old and new components in case we want to migrate away from the old IPA.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to add to this that sorting also requires an unshuffle operation that undoes a shuffle. In the new IPA approach, we dont need that. This makes everything simpler since we do not need to store the permutation explicitly in memory and then apply it (we can just run a seeded shuffling of a vector). Nevertheless if we want to have compatibility of the new shuffle with the old IPA, more specifically radix sort, we would need to implement a less efficient version of the shuffle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think shuffle implementations are at the state where we can switch between them. The new one requires special input format (moving from struct { secret_share1, secret_share2, ... } to secret_share { struct1, struct2 }) while the old one requires them to be reshareable only. It would be ideal if shuffle is generalized over some sort of a trait, but I don't think it is atm.

So we may want to keep OPRF-specific things inside a different module

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't OPRF-specific, it's just a different shuffle. It's OK if we have two shuffles for now, but this shouldn't have to be bundled in with the OPRF.

Copy link
Collaborator

@danielmasny danielmasny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! Looks good to me (ignoring the query parts). I haven't checked any of the shuffling parts except that you are using sharedvalue. Is there a way to see only your changes with respect to my previous review?

@@ -5,12 +5,9 @@ pub use shuffle::shuffle_shares;
use crate::{
error::Error,
protocol::{
basics::Reshare,
basics::{apply_permutation::apply_inv, Reshare},
context::Context,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to separate apply_permutation from sort. As Alex pointed out, we can shuffle the data directly using the random seed. If we wanted to have compatibility with the old IPA including the sorting, we would need to use apply_permutation in the new shuffle such that we can undo the shuffle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially leave it if we want to make the new shuffle compatible with radix sort at some point

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - apply_permutation is much less efficient than shuffling directly - #817

@@ -0,0 +1,403 @@
use std::ops::{Add, AddAssign};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to add to this that sorting also requires an unshuffle operation that undoes a shuffle. In the new IPA approach, we dont need that. This makes everything simpler since we do not need to store the permutation explicitly in memory and then apply it (we can just run a seeded shuffling of a vector). Nevertheless if we want to have compatibility of the new shuffle with the old IPA, more specifically radix sort, we would need to implement a less efficient version of the shuffle.

}
}

impl SharedValue for ShuffleShare {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

src/protocol/oprf/shuffle/mod.rs Outdated Show resolved Hide resolved

impl ShuffleShare {
#[must_use]
pub fn from_input_row(input_row: &ShuffleInputRow, shared_with: Direction) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this, could you explain me why we need these functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, lets just inline them

/// streams.
///
/// <https://github.com/rust-lang/rust/issues/102211#issuecomment-1367900125>
pub fn assert_stream_send<'a, T>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . would be even better if we put it inside lib.rs

gateway,
input,
move |prss, gateway, config, input| {
let ctx = BaseContext::new(prss, gateway);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should construct semi honest context here - base is for internal use

@@ -5,12 +5,9 @@ pub use shuffle::shuffle_shares;
use crate::{
error::Error,
protocol::{
basics::Reshare,
basics::{apply_permutation::apply_inv, Reshare},
context::Context,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - apply_permutation is much less efficient than shuffling directly - #817

Self { _config: config }
}

#[tracing::instrument("ipa_query", skip_all, fields(sz=%query_size))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[tracing::instrument("ipa_query", skip_all, fields(sz=%query_size))]
#[tracing::instrument("shuffle_query", skip_all, fields(sz=%query_size))]

pub async fn execute<'a, C: Context + Send>(
self,
ctx: C,
query_size: QuerySize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to use this parameter to truncate the inputs -

pub async fn execute<'a>(


// ------------------ Pseudorandom permutations functions -------------------- //

fn apply_permutation<R: Rng, S>(rng: &mut R, items: &mut [S]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason for having this function

l.into_iter().zip(r).map(|(a, b)| a + b)
}

fn add_single_shares_in_place<S, R>(mut items: Vec<S>, r: R) -> Vec<S>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies () as output of this function

where
C: Context,
I: IntoIterator<Item = AdditiveShare<S>>,
S: SharedValue + Add<Output = S> + Message,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
S: SharedValue + Add<Output = S> + Message,
S: SharedValue + Add<Output = S>,

ctx: &C,
step: &OPRFShuffleStep,
direction: Direction,
items: Vec<S>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this contract causes extra clones of the size N on the caller's side. It is possible to avoid by taking an iterator of S: SharedValue.

Direction::Left,
c_hat_2.clone(),
),
receive_from_peer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should receive this to y_3 allocation.

Artem Ignatyev and others added 7 commits October 30, 2023 11:04
1. Moved assert_stream_send into lib.rs
2. Used SemiHonest context in oprf_shuffle/query.rs
3. tracing instrumentation "shuffle_query" in oprf_shuffle query runner
4. Truncate input to batch_size
5. Got rid of apply_permutation function. Replaced it with inline calls
   to Vec::shuffle
6. add_single_shares_in_place does not take ownership of the first
   argument and does not return a new Vec
7. Removed unnecessary Message trait bound on run_h2
8. Manually preallocating memory for receives in order to enable reusing
   of preallocated memory
9. reuse y_3 allocation to receive c_hat_1 in run_h3
10. reuse x_3 to receive c_hat_2 in run_h1
11. reuse y_1 too add compute y_2
12. Inline from_input_row/to_inpput_row functions
1. Reduced the number of allocated "tables" in H1 to 1
2. Reduced the number of allocated "tables in H2 to 3

This was possible by passing a references to items to send_to_peer
function and adding S: Copy trait bound to it
1. Renamed "receive_from_peer" function to "receive_from_peer_into" to
   denote that it receive data into a given buffer
2. Updated signatures of "send_to_peer" and "receive_from_peer_into"
   functions to accept buffer of data and batch_size arguments first
3. Introduced "repurpose_allocation" function to distinguish the place
   where I want to preserve the data for subsequent additions from where
   I want to "clear" it.
* Running them under randomized executor is not helpful as there is no concurrency/parallelism there. We should write a prop test instead
* Add on references can be generic for all GF
Copy link
Collaborator

@akoshelev akoshelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very close. I pushed some polishing and added a TODO - lmk if they don't make sense to you

add_single_shares_in_place(&mut x_3, z_23);
x_3.shuffle(&mut rng_perm_r);

let mut c_hat_1 = repurpose_allocation(y_1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be cheap but it has a non-trivial cost of zeroing out memory. I don't see a need to do that, you can just replace elements in that vector from the iterator.

) -> Result<(), Error>
where
C: Context,
S: Copy + Message,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use SharedValue, Message is not an appropriate abstraction for it

Artem Ignatyev added 2 commits November 1, 2023 15:56
1. Used SharedValue instead of Message on trait bounds
Rsolved conflicts in:
	src/helpers/transport/query/mod.rs
	src/net/http_serde.rs
	src/query/executor.rs
	src/query/runner/mod.rs
Copy link
Collaborator

@akoshelev akoshelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think we can address other things in a separate PR

@akoshelev akoshelev merged commit 650cb4b into private-attribution:main Nov 7, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants