Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Workflow_Interface_Mnist_Implementation_2.py #1227

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

agnivac123
Copy link

Hi @scngupta-dsp , @teoparvanov , and @psfoley ,here is the PR that I created for the GlobalDP tutorial. Please let me know your thoughts on whether we can remove the customized getstate and setstate methods.

@agnivac123 agnivac123 marked this pull request as draft December 23, 2024 09:40
@agnivac123 agnivac123 marked this pull request as ready for review December 23, 2024 09:50
…ements_global_dp.txt

Signed-off-by: Agniva Chowdhury <agniva.chowdhury@intel.com>
Signed-off-by: Agniva Chowdhury <agniva.chowdhury@intel.com>
@scngupta-dsp
Copy link
Contributor

Hi @agnivac123

I spent some time investigating this tutorial and my observations are as follows:

  1. Tutorial is failing even with backend = single_process with following error:

File "/home/scngupta/openfl_scngupta/openfl/openfl-tutorials/experimental/workflow/Global_DP/Workflow_Interface_Mnist_Implementation_2.py", line 205, in FedAvg
global_model_tools.global_optimizer.step()
File "/home/scngupta/miniforge-pypy3/envs/env_openfl_scngupta_globaldp/lib/python3.10/site-packages/torch/optim/optimizer.py", line 375, in wrapper
for pre_hook in chain(_global_optimizer_pre_hooks.values(), self._optimizer_step_pre_hooks.values()):
AttributeError: 'DPOptimizer' object has no attribute '_optimizer_step_pre_hooks'. Did you mean: '_optimizer_step_code'?

This was quite unexpected because our previous investigations indicated that there were no issues with single process backend.

  1. To investigate further, I reverted the dependencies of tutorial back to their original version and with this I was able to run the tutorial with backend = single_process

  2. With above changes the tutorial fails with backend = ray with following error:

File "/home/scngupta/openfl_scngupta/openfl/openfl-tutorials/experimental/workflow/Global_DP/Workflow_Interface_Mnist_Implementation_2.py", line 201, in FedAvg
global_model_tools.global_optimizer.zero_grad()
File "/home/scngupta/miniforge-pypy3/envs/env_openfl_scngupta_globaldp/lib/python3.10/site-packages/opacus/optimizers/optimizer.py", line 474, in zero_grad
if not self._is_last_step_skipped:
AttributeError: 'DPOptimizer' object has no attribute '_is_last_step_skipped'

  1. As discussed previously this issue seems to be a serialization issue with PrivacyEngine. To overcome this issue I modified the Tutorial to define global_model_tools as an aggregator private attribute (similar to what has been done in this PR)

With these changes the tutorial seems to be working fine (reference: scngupta-dsp#1) with both backends. Refer attached logs

Summary:
My current understanding is that there seem to be two issues with this Tutorial

a) Issue in DPOptimizer with single process backend. Since it is working with previous versions of Pytorch and Opacus, there could be a need to adapt the tutorial to latest versions

b) Issue in serialization of DPOptimizer with ray backend. This issue should get solved by defining global_model_tools as an aggregator private attribute

For both the issues, I am not able to see why there would be a need to add getstate and setstate methods for GlobalModelTools(). WDYT ?

GlobalDP_ray_logs.txt
GlobalDP_single_process_logs.txt

@scngupta-dsp
Copy link
Contributor

Hi @agnivac123,

Some more updates to address the issue observed with the latest Torch versions and backend=single_process:

Error:
File "/home/scngupta/openfl_scngupta/openfl/openfl-tutorials/experimental/workflow/Global_DP/Workflow_Interface_Mnist_Implementation_2.py", line 205, in FedAvg
global_model_tools.global_optimizer.step()
File "/home/scngupta/miniforge-pypy3/envs/env_openfl_scngupta_globaldp/lib/python3.10/site-packages/torch/optim/optimizer.py", line 375, in wrapper
for pre_hook in chain(_global_optimizer_pre_hooks.values(), self._optimizer_step_pre_hooks.values()):
AttributeError: 'DPOptimizer' object has no attribute '_optimizer_step_pre_hooks'. Did you mean: '_optimizer_step_code'?

I updated the GlobalModelTools and manually defined _optimizer_step_pre_hooks and _optimizer_step_post_hooks as attributes of the global_optimizer. With these changes, the tutorial appears to work fine (reference: scngupta-dsp#2) with both backends and the latest Torch versions.


Summary:

  1. Issue with DPOptimizer:

    • The DPOptimizer object created by the Opacus library seems to have compatibility issues with the latest Torch versions.
    • Previously, there were serialization issues with this object. Now, certain attributes expected by the native Torch optimizer (_optimizer_step_pre_hooks) are also missing.
  2. Serialization Issue:

    • Serialization issues with DPOptimizer can be mitigated by defining GlobalModelTools as an aggregator private attribute.
    • There’s no need to create __setstate__ and __getstate__ methods.
    • Guideline: For the Workflow Interface, objects that are not serializable should be created as private attributes of participants (Aggregator or Collaborator).
  3. Attribute Mismatch:

    • The missing attributes in DPOptimizer (_optimizer_step_pre_hooks) indicate a mismatch between the Opacus library and the latest PyTorch versions

Recommendation:

Due to the compatibility issues between the Opacus library and the latest PyTorch versions observed in this tutorial, I would recommend continuing with earlier versions of Opacus and Pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants