Daniel Masny (@danielmasny)
We describe the high level flow of IPA that uses a PRF, i.e. the Dodis-Yampolski PRF, to shard events by match key. This allows us to group user events efficiently. However it leaks the amount of user events. We can use DP to hide this leakage.
This documents only describes the high level flow, but not the specific protocol that is used for evaluating the PRF nor a scalable shuffle protocol.
The main bottleneck for the horizontal scalability of this approach is currently the shuffling protocol that happens at the end of the second stage.
A query from the report collector to the helper servers contains the following information:
- How much DP budget is spent on this query,
dpb
- A DP sensitivity cap,
cap
- A bound on the number of events per session,
N_s
- A bound on sessions per user,
U_s
- Amount of events,
N
- A list of
N
encrypted events (this does not need to be sorted by timestamp). The encryption contains the following information:- match key: arithmetic share over
ℤp
, wherep
is the amount of curve points of the PRF elliptic curve - timestamp: Boolean share
- trigger bit: Boolean share
- trigger value: Boolean share (s.t. shared value is guaranteed to be
<cap
) - breakdown key: Boolean share
- match key: arithmetic share over
The MPC helper server do the following:
- Check whether the report collector has enough budget. If not abort, otherwise subtract
dpb
from the budget. - Event Clean Up:
- Each helper decrypts all events. They mark all events for which decryption fails
- The helpers check whether the shares are consistent across helpers
- The helpers delete all events with decryption failures and inconsistent shares
- Enforcing Session Bound (bounding amount of events per “session”):
- Each helper groups events by shares, i.e. events with the same shares are identified as within the same session.
- The helper coordinate to delete events of each session such that the amount of events per session is Ns.
The MPC helper server do the following:
-
Generating Fake events:
- Each pair of helper servers
[i,j]
in([1,2],[1,3],[2,3])
generate for eachk
in(1,...,N_s*U_s )
:- noise
e_(i,j,k)
, sampled from DP noise distribution parameterized with sensitivity 1 - For each
m
in(1,..., e_(i,j,k))
:- pick a random match key from the entire match key space (pick a random share
shi,j
) and generatek
fake events for this match key. We can use identical shares for all of these events (the shuffle operation that will follow later will randomize them).
- pick a random match key from the entire match key space (pick a random share
- noise
- Each pair informs the other helper how many fake events have been generated in total, i.e.
N_(i,j) = SUM_(k=1)^(N_sU_s) k*e_(i,j,k)
. Afterwards, this helper generatesN_(i,j)
events and initializes its shares with0
.
- Each pair of helper servers
-
Run an MPC shuffle protocol to shuffle all events including fake events. The total amount of events is
N_t = N + N_(1,2) + N_(1,3) + N_(2,3)
.
The MPC helper learn a histogram. The bins are the number of events. The value for each bin indicates how many match keys (including fake match keys) exist that have this amount of events.
The MPC helper server do the following:
- For each event
k
in(1,...,N_t)
:- Evaluate the OPRF on the match key shares using the OPRF MPC protocol. The OPRF MPC protocol reveals the OPRF evaluation on the match key to all helper servers. We denote this OPRF evaluation as the match key pseudonym (which changes for a match key across different IPA queries).
- Group the events by match key pseudonym (e.g. by sorting in the clear). After the events are grouped by match key pseudonym, we delete the match key pseudonym of each event.
- Check whether any group has more than
N_s*U_s
events. If this is the case, there is a privacy violation since there is no group of fake events for groups with more thanN_s*U_s
events.
The MPC helper server do the following:
- For every group of events (i.e., events with the same match key pseudonym),
- run an MPC sorting protocol to sort the events within the group based on timestamp
- run an MPC attribution protocol that assigns breakdown keys to trigger events if there has been a conversion for the trigger event.
The MPC helper server do the following:
- [optional:] prune the amount of events in each group (by only keeping a fixed amount of attributed events)
- merge the events of all groups
- [optional:] prune the total amount of events (by only keeping attributed events)
- run an MPC aggregation protocol
The MPC helpers send their secret share of the sum of attributed trigger values per breakdown to the report collector who can recombine the shares and learn the sums per breakdown.