Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create logistic_regression.md #83

Merged
merged 6 commits into from
Oct 6, 2023
Merged

Conversation

benjaminsavage
Copy link
Collaborator

Share WALR algorithm as a readme file

Share WALR algorithm as a readme file
Fixed a few minor bugs.
$X = (X^{(1)}, ..., X^{(N)}) \in [0, 1]^{N*k}$ and $y = (y^{(1)}, ..., y^{(N)})^T \in \\{0, 1\\}^N$ \
denote the *(N \* k)*-dimensional feature *matrix* and *N*-dimensional label vector respectively, then we have

$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{C\cdot y}{N}$ ,\
Suggested change
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N}\cdot X \cdot y$ ,\

I found the positioning of the dot a little annoying. Because you are saying dot-product, but then putting the dot elsewhere.

bmcase and others added 3 commits October 5, 2023 20:53
removed a stray sigma and made Martin's suggested changes including adding authors
Copy link
Collaborator

@bmcase bmcase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.. made a few formatting suggestions.

Any additional computations used to update $\text{noisy-} \nabla L(\theta)$ (as in a gradient descent procedure) will still be label-DP (with the same privacy parameters) due to the Post Processing Theorem of DP (see Dwork+11 text).

- **Question:** is $\text{noisy-dot-product}$ efficiently computable?\
**Answer:** Yes, computing this vector requires just one pass through the set of feature vectors, and $k$ random draws from a Gaussian distribution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should at least mention here that when implementing in MPC we will compute the division by N outside the MPC.


First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using $:=$ for these two lines defining new notation

$\text{noisy-} \nabla L(\theta) = (\frac{1}{N}) \cdot (\sum LHS) – (\frac{1}{N}) \cdot (\sum RHS) - \text{gaussian-noise}$.

To avoid computing $\text{LHS}$ at every optimization step, we approximate this term using a minibatch of size $m$. Specifically, at every gradient descent step, we sample a minibatch $M$ of size $m$, and we compute\
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also $:=$ with defining new terminology

1. initialize model vector $\theta$
2. while not converged:\
sample minibatch of size $m$, and\
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$\theta := ...$

- In the absence of any computational or privacy constraints, the model can be trained via full-batch gradient descent of the form, where here $\text{lr}$ is the learning rate:
1. initialize model vector $\theta$
2. while not converged: \
$\text{set } \theta = \theta - \text{lr} \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} \sigma(\theta^T X^{(i)}) X^{(i)} ) - \frac{1}{N} \cdot \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} ))$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$\theta :=$


1. initialize model vector $\theta$
2. while not converged:\
$\text{set } \theta = \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\text{set } \theta = \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$
$\text{set } \theta := \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$

1. initialize model vector $\theta$
2. while not converged:\
sample minibatch of size $m$, and\
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$
$\text{set } \theta := \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$

$\text{noisy-} \nabla L(\theta) = (\frac{1}{N}) \cdot (\sum LHS) – (\frac{1}{N}) \cdot (\sum RHS) - \text{gaussian-noise}$.

To avoid computing $\text{LHS}$ at every optimization step, we approximate this term using a minibatch of size $m$. Specifically, at every gradient descent step, we sample a minibatch $M$ of size $m$, and we compute\
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\
$\text{mini-}\sum\text{LHS} := \sum\limits_{j=1}^{m} p_j X^{(j)}$,\

To circumvent this computational issue, we use a simple **"hybrid"-minibatch gradient** computation that we describe here.

First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\
$\sum\text{LHS} := \sum\limits_{i=1}^{N} p_i X^{(i)}$\


First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$
$\sum\text{RHS} := \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$

@benjaminsavage benjaminsavage merged commit ef8fcc2 into main Oct 6, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants