About gradient descent on the client side #2

JackFroster · 2022-09-16T17:01:39Z

Hi, Jiahao Tan.
Thanks for your work.

I have some confusion about the code on lines 98 of "per-fedavg /perfedavg.py".
param.data.sub_(self.beta * grad1 - self.beta * self.alpha * grad2)
According to the formula in the article, I think "self.beta * self.alpha * grad2" seems to miss "grad1".

The text was updated successfully, but these errors were encountered:

KarhouTam · 2022-09-17T01:56:49Z

Hi, Jack.
Thanks for your attention to my reproduction.

Actually, the formula for computing $\tilde{\nabla}^2$ is referred to in another paper by the author of PerFedAvg. https://arxiv.org/abs/1908.10400

According to the formula shown above and the computation way of grad2, the update method should be reasonable now.

grad2 is $\nabla^2$, and it takes grad1 as the $v$ in the formula. So there is unnecessary to multiply another grad1 by grad2.😏

To fully allay your concern, the fraction of source code I got from asking the author of PerFedAvg is shown below. I don't know if the author of PerFedAvg wants the source code shared, so I choose to show only some codes to you rather than give you the whole file.

        for t = 1:tau

            B_1 = randperm(user_l(i), D_i); % get data batch

            [lgw12, lgw23, lgw34, lgb12, lgb23, lgb34] = grad_batch (lw12, lw23, lw34, lb12, lb23, lb34, Dat(:, B_1, i), Lab(:, B_1, i), D_i);

            B_2 = randperm(user_l(i), D_o);

            [lgw12, lgw23, lgw34, lgb12, lgb23, lgb34] = grad_batch (lw12 - al * lgw12, lw23 - al * lgw23, lw34 - al * lgw34, lb12 - al * lgb12, lb23 - al * lgb23, lb34 - al * lgb34, Dat(:, B_2, i), Lab(:, B_2, i), D_o);

            B_3 = randperm(user_l(i), D_h);
            % NOTE: batch_3's size is 20, not 40; v is the grads produced by batch_1 and _2, not 1!
            [lh1w12, lh1w23, lh1w34, lh1b12, lh1b23, lh1b34] = grad_batch (lw12 - de * lgw12, lw23 - de * lgw23, lw34 - de * lgw34, lb12 - de * lgb12, lb23 - de * lgb23, lb34 - de * lgb34, Dat(:, B_3, i), Lab(:, B_3, i), D_h);
            [lh2w12, lh2w23, lh2w34, lh2b12, lh2b23, lh2b34] = grad_batch (lw12 + de * lgw12, lw23 + de * lgw23, lw34 + de * lgw34, lb12 + de * lgb12, lb23 + de * lgb23, lb34 + de * lgb34, Dat(:, B_3, i), Lab(:, B_3, i), D_h);

            lw12 = lw12 - be * lgw12 + be * al / (2 * de) * (lh2w12 - lh1w12);
            lw23 = lw23 - be * lgw23 + be * al / (2 * de) * (lh2w23 - lh1w23);
            lw34 = lw34 - be * lgw34 + be * al / (2 * de) * (lh2w34 - lh1w34);
            lb12 = lb12 - be * lgb12 + be * al / (2 * de) * (lh2b12 - lh1b12);
            lb23 = lb23 - be * lgb23 + be * al / (2 * de) * (lh2b23 - lh1b23);
            lb34 = lb34 - be * lgb34 + be * al / (2 * de) * (lh2b34 - lh1b34);

        end

JackFroster · 2022-09-17T14:55:05Z

Thanks for your answer. I understood it.

KarhouTam · 2022-09-17T15:00:36Z

I'm glad for helping you. Just keep this issue open for someone else who also feels confused about that. 😏

KarhouTam added the good first issue Good for newcomers label Sep 17, 2022

JackFroster closed this as completed Sep 17, 2022

KarhouTam reopened this Sep 17, 2022

KarhouTam pinned this issue Sep 20, 2022

KarhouTam closed this as completed Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About gradient descent on the client side #2

About gradient descent on the client side #2

JackFroster commented Sep 16, 2022

KarhouTam commented Sep 17, 2022 •

edited

Loading

JackFroster commented Sep 17, 2022

KarhouTam commented Sep 17, 2022

About gradient descent on the client side #2

About gradient descent on the client side #2

Comments

JackFroster commented Sep 16, 2022

KarhouTam commented Sep 17, 2022 • edited Loading

JackFroster commented Sep 17, 2022

KarhouTam commented Sep 17, 2022

KarhouTam commented Sep 17, 2022 •

edited

Loading