感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的 #37

YourPeer · 2023-08-08T08:29:15Z

Traceback (most recent call last):
File "/home/archlab/lzr/easyFL-FLGo/main.py", line 31, in
runner.run()
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedbase.py", line 243, in run
updated = self.iterate()
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedasync.py", line 25, in iterate
res = self.communicate(self.selected_clients, asynchronous=True)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout
return communicate(self, selected_clients, mtype, asynchronous)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout
return communicate(self, selected_clients, mtype, asynchronous)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 573, in communicate_with_clock
if pkgs[0].get('__cid', None) is None:
IndexError: list index out of range

YourPeer · 2023-08-08T15:27:42Z

另一个问题是当我运行Hierfl时，错误提示：
2023-08-08 23:23:46,263 simple_logger.py log_once [line:14] INFO Current_time:0
Traceback (most recent call last):
File "/home/archlab/lzr/easyFL-FLGo/main.py", line 31, in
runner.run()
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/hierarchical.py", line 21, in run
self.gv.logger.log_once()
File "/home/archlab/lzr/easyFL-FLGo/flgo/experiment/logger/simple_logger.py", line 19, in log_once
val_metrics = self.coordinator.global_test(flag='val')
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/hierarchical.py", line 48, in global_test
es_metrics = es.global_test(model, flag)
AttributeError: 'BasicClient' object has no attribute 'global_test'

WwZzz · 2023-08-09T12:06:47Z

你好，hierFL因为它的架构跟传统一对多的中心服务器架构不同，因此benchmark设置跟横向联邦不是共享的。hierFL目前还没有写tutorial，后面会写相关的tutorial。若你想要尝试hierFL的运行，可以运行以下代码

import flgo
import flgo.experiment.logger
import os
bmkname = 'hier_mnist' # benchmark的名称
bmk_config = './config_hier_mnist.py' # 准备的配置文件路径
if not os.path.exists(bmkname):
    bmk = flgo.gen_hierarchical_benchmark(bmkname, bmk_config, target_path='.', data_type='cv', task_type='classification')
import flgo.algorithm.hierarchical as hierFL
bmk = bmkname
task = './my_test_hier'
task_config = {
    'benchmark':bmk,
}
if not os.path.exists(task): flgo.gen_task(task_config, task_path=task)
# 运行fedavg算法
runner = flgo.init(task, hierFL, {'gpu':[0,],'log_file':True, 'learning_rate':0.1,'num_steps':1, 'batch_size':128, 'num_rounds':10000, 'proportion':1.0, 'train_holdout':0.2,'local_test':True,'eval_interval':1,},scene='hierarchical')
runner.run()

WwZzz · 2023-08-09T12:09:16Z

Traceback (most recent call last): File "/home/archlab/lzr/easyFL-FLGo/main.py", line 31, in runner.run() File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedbase.py", line 243, in run updated = self.iterate() File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedasync.py", line 25, in iterate res = self.communicate(self.selected_clients, asynchronous=True) File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout return communicate(self, selected_clients, mtype, asynchronous) File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout return communicate(self, selected_clients, mtype, asynchronous) File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 573, in communicate_with_clock if pkgs[0].get('__cid', None) is None: IndexError: list index out of range

该bug因为没有检查pkgs是否是空数组引起的。我已在最新版本中修复该bug，感谢你的反馈。
注：异步算法的尝试需要配合系统异构性模拟器（用户响应时间异构）。这里给出一段测试代码：

import flgo
import os

# the target path of the task
task = './my_first_task'

# create task configuration
task_config = {'benchmark':{'name': 'flgo.benchmark.mnist_classification'}, 'partitioner':{'name':'IIDPartitioner', 'para':{'num_clients':100}}}

# generate the task if the task doesn't exist
if not os.path.exists(task):
    flgo.gen_task(task_config, task)

import flgo.algorithm.fedasync as fedasync
# create fedavg runner on the task
runner = flgo.init(task, fedasync, {'gpu':[0,],'log_file':True, 'num_steps':5, 'responsiveness':'UNI-5-1000'})
runner.run()

WwZzz · 2023-08-09T12:15:22Z

HierFL里的

配置文件config_hier_mnist.py的内容如下：

import os
import torchvision
import torch
import flgo.benchmark

transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize((0.1307,), (0.3081,))]
)
path = os.path.join(flgo.benchmark.path, 'RAW_DATA', 'MNIST')
# 定义训练集实例，并命名为train_data
train_data = torchvision.datasets.MNIST(root=path, train=True, download=True, transform=transform)
# 定义测试集实例，并命名为test_data
test_data = torchvision.datasets.MNIST(root=path, train=False, download=True, transform=transform)

class mlp(torch.nn.Module):
    def __init__(self):
        super(mlp, self).__init__()
        self.fc1 = torch.nn.Linear(784, 200)
        self.fc2 = torch.nn.Linear(200, 200)
        self.fc3 = torch.nn.Linear(200, 10)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = x.view(-1, x.shape[1] * x.shape[-2] * x.shape[-1])
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 定义返回模型的函数，并命名为get_model
def get_model():
    return mlp()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的 #37

感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的 #37

YourPeer commented Aug 8, 2023

YourPeer commented Aug 8, 2023

WwZzz commented Aug 9, 2023 •

edited

Loading

WwZzz commented Aug 9, 2023

WwZzz commented Aug 9, 2023

感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的 #37

感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的 #37

Comments

YourPeer commented Aug 8, 2023

YourPeer commented Aug 8, 2023

WwZzz commented Aug 9, 2023 • edited Loading

WwZzz commented Aug 9, 2023

WwZzz commented Aug 9, 2023

WwZzz commented Aug 9, 2023 •

edited

Loading