-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create logistic_regression.md #83
Conversation
Share WALR algorithm as a readme file
Fixed a few minor bugs.
logistic_regression.md
Outdated
$X = (X^{(1)}, ..., X^{(N)}) \in [0, 1]^{N*k}$ and $y = (y^{(1)}, ..., y^{(N)})^T \in \\{0, 1\\}^N$ \ | ||
denote the *(N \* k)*-dimensional feature *matrix* and *N*-dimensional label vector respectively, then we have | ||
|
||
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\ | |
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{C\cdot y}{N}$ ,\ |
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N} \cdot Xy$ ,\ | |
$\text{dot-product} = \frac{1}{N} \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} = \frac{1}{N}\cdot X \cdot y$ ,\ |
I found the positioning of the dot a little annoying. Because you are saying dot-product, but then putting the dot elsewhere.
removed a stray sigma and made Martin's suggested changes including adding authors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.. made a few formatting suggestions.
logistic_regression.md
Outdated
Any additional computations used to update $\text{noisy-} \nabla L(\theta)$ (as in a gradient descent procedure) will still be label-DP (with the same privacy parameters) due to the Post Processing Theorem of DP (see Dwork+11 text). | ||
|
||
- **Question:** is $\text{noisy-dot-product}$ efficiently computable?\ | ||
**Answer:** Yes, computing this vector requires just one pass through the set of feature vectors, and $k$ random draws from a Gaussian distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should at least mention here that when implementing in MPC we will compute the division by N outside the MPC.
logistic_regression.md
Outdated
|
||
First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\ | ||
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\ | ||
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using
logistic_regression.md
Outdated
$\text{noisy-} \nabla L(\theta) = (\frac{1}{N}) \cdot (\sum LHS) – (\frac{1}{N}) \cdot (\sum RHS) - \text{gaussian-noise}$. | ||
|
||
To avoid computing $\text{LHS}$ at every optimization step, we approximate this term using a minibatch of size $m$. Specifically, at every gradient descent step, we sample a minibatch $M$ of size $m$, and we compute\ | ||
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also
logistic_regression.md
Outdated
1. initialize model vector $\theta$ | ||
2. while not converged:\ | ||
sample minibatch of size $m$, and\ | ||
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logistic_regression.md
Outdated
- In the absence of any computational or privacy constraints, the model can be trained via full-batch gradient descent of the form, where here $\text{lr}$ is the learning rate: | ||
1. initialize model vector $\theta$ | ||
2. while not converged: \ | ||
$\text{set } \theta = \theta - \text{lr} \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} \sigma(\theta^T X^{(i)}) X^{(i)} ) - \frac{1}{N} \cdot \sum\limits_{i=1}^{N} y^{(i)} X^{(i)} ))$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logistic_regression.md
Outdated
|
||
1. initialize model vector $\theta$ | ||
2. while not converged:\ | ||
$\text{set } \theta = \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\text{set } \theta = \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$ | |
$\text{set } \theta := \theta - lr \cdot ((\frac{1}{N} \cdot \sum\limits_{i=1}^{N} p_i X^{(i)} ) - \text{noisy-dot-product})$ |
logistic_regression.md
Outdated
1. initialize model vector $\theta$ | ||
2. while not converged:\ | ||
sample minibatch of size $m$, and\ | ||
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\text{set } \theta = \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$ | |
$\text{set } \theta := \theta - lr \cdot ( \text{noisy-hybrid-} \nabla L(\theta))\$ |
logistic_regression.md
Outdated
$\text{noisy-} \nabla L(\theta) = (\frac{1}{N}) \cdot (\sum LHS) – (\frac{1}{N}) \cdot (\sum RHS) - \text{gaussian-noise}$. | ||
|
||
To avoid computing $\text{LHS}$ at every optimization step, we approximate this term using a minibatch of size $m$. Specifically, at every gradient descent step, we sample a minibatch $M$ of size $m$, and we compute\ | ||
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\text{mini-}\sum\text{LHS} = \sum\limits_{j=1}^{m} p_j X^{(j)}$,\ | |
$\text{mini-}\sum\text{LHS} := \sum\limits_{j=1}^{m} p_j X^{(j)}$,\ |
logistic_regression.md
Outdated
To circumvent this computational issue, we use a simple **"hybrid"-minibatch gradient** computation that we describe here. | ||
|
||
First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\ | ||
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\ | |
$\sum\text{LHS} := \sum\limits_{i=1}^{N} p_i X^{(i)}$\ |
logistic_regression.md
Outdated
|
||
First, we define the terms $\text{LHS}$ and $\text{RHS}$ as\ | ||
$\sum\text{LHS} = \sum\limits_{i=1}^{N} p_i X^{(i)}$\ | ||
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\sum\text{RHS} = \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$ | |
$\sum\text{RHS} := \sum\limits_{i=1}^{N} y^{(i)} X^{(i)}$ |
A few more :=
Share WALR algorithm as a readme file