ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

0kuang · 2023-03-03T04:24:50Z

I deploy FATE following 使用Docker Compose 部署 FATE

After deployment, I use the following command to enter the client container:
docker exec -it confs-10000_client_1 bash

But when executing ./examples/benchmark_quality/homo_nn/fate-homo_nn.py the following error was reported:

Traceback (most recent call last):
  File "./fate-homo_nn.py", line 25, in <module>
    from federatedml.evaluation.metrics import classification_metric
ModuleNotFoundError: No module named 'federatedml'

How do I import the federatedml package in the client container?

Besides, I am a beginner and not familiar with the FATE framework. I would like to know how to use Python or jupyter to develop federated learning code in the case of docker deployment (for example, run the Resnet-example or building a custom dataset) instead of using the flow command.

Thanks!

The text was updated successfully, but these errors were encountered:

zhihuiwan · 2023-03-03T04:31:53Z

environment needs to be imported before use：

source /data/projects/fate/bin/init_env.sh

0kuang · 2023-03-03T04:36:18Z

root@bf1b603f8015:/data/projects/fate# cd bin
bash: cd: bin: No such file or directory

My FATE version is v1.10.0

It seems that there is no such script.

owlet42 · 2023-03-03T06:09:20Z

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

0kuang · 2023-03-03T06:25:44Z

How can I install these two packages manually?

zhihuiwan · 2023-03-03T06:28:24Z

You can try to set pythonpath and run it：

export PYTHONPATH=/data/projects/fate/fate/python

0kuang · 2023-03-03T09:14:23Z

root@ff9d37a0afb0:/data/projects/fate# cd /data/projects/fate/fate/python
bash: cd: /data/projects/fate/fate/python: No such file or directory

It seems that in the client container, the federatedml & python related folders are missing.

0kuang · 2023-03-04T14:52:37Z

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

@owlet42

Sorry to bother you again, is there a way for me to manually install federatedml? I hope to continue my studies.

Thanks.

owlet42 · 2023-03-06T07:29:35Z

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

@owlet42

Sorry to bother you again, is there a way for me to manually install federatedml? I hope to continue my studies.

Thanks.

@0kuang

A simple way is to add a volume mount for federatedml, and add the federatedml path to the PYTHONPATH environment variable.

After I tried it, I found that there are other dependencies that need to be resolved.

0kuang · 2023-03-06T14:57:59Z

I solved the dependency problem as you said:

set the PYTHONPATH
Clone the code of the missing package in the github repo
Copy a service_conf.yaml

Now I have a new problem, a new error occurs when executing pipeline.fit():

ValueError: job submit failed, err msg: {'jobId': '202303062227458326320', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):
  File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit
    raise Exception("create job failed", response)
Exception: (\'create job failed\', {\'guest\': {9999: {\'retcode\': <RetCode.FEDERATED_ERROR: 104>, \'retmsg\': \'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\
\\tstatus = StatusCode.UNKNOWN\
\\tdetails = "\
[Roll Site Error TransInfo] \
 location msg=java.lang.String cannot be cast to java.lang.Integer \
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\
\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\
\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\
\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\
\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\
\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\
\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\
\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\
\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\
\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\
\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\
\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\
\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\
\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\
\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\
\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\
\\tat java.lang.Thread.run(Thread.java:750)\
 \
\
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
 location msg=java.lang.String cannot be cast to java.lang.Integer \\\
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\
\\\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\\\
\\\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\\\
\\\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\\\
\\\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\\
\\\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\\
\\\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\\
\\\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\\
\\\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\\
\\\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\\
\\\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\\
\\\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\\
\\\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\\
\\\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\\
\\\\tat java.lang.Thread.run(Thread.java:750)\\\
 \\\
\\\
exception trans path: rollsite(10000)","grpc_status":2}"\
>\'}}, \'host\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': True}, \'reader_1\': {\'need_run\': True}}}, \'retcode\': 0, \'retmsg\': \'success\'}}, \'arbiter\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': False}, \'reader_1\': {\'need_run\': False}}}, \'retcode\': 0, \'retmsg\': \'success\'}}})
'}

I think the key lies in the rollsite, I don't know if it is helpful for you to judge.

# key
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
 location msg=java.lang.String cannot be cast to java.lang.Integer \\\
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\

thank you for your reply~

owlet42 · 2023-03-07T05:01:25Z

Please make sure that all components of your FATE are working properly and can complete unilateral and multilateral toy tests.

flow test toy -gid 9999 -hid 9999    # unilateral
flow test toy -gid 9999 -hid 10000   # multilateral

0kuang · 2023-03-15T06:45:50Z

I can now run the example code for Resnet with homo-nn correctly.

I would like to ask how to use GPU to accelerate training in FATE deployed by docker. Do you have any recommended tutorials?

In addition, which container will the task submitted through jupyter on confs_10000_client-1 eventually run on?

Thanks for your answer.

owlet42 · 2023-03-16T06:36:32Z

Currently does not support the deployment of GPU, the FATE task is mainly run in fateflow, the detailed process can refer to here https://federatedai.github.io/FATE-Flow/latest/fate_flow/

0kuang · 2023-03-23T07:42:09Z

Which deployment method supports GPU?

The FedAvgTrainer in the FATE framework supports cuda=True. Is this parameter useful?

owlet42 · 2023-03-24T05:35:46Z

FedAvgTrainer has this configuration, and you can try setting cuda=True to use GPU.

zhihuiwan transferred this issue from FederatedAI/FATE Mar 3, 2023

owlet42 added the bug Something isn't working label Mar 3, 2023

owlet42 self-assigned this Mar 3, 2023

owlet42 added this to the v1.11.0 milestone Apr 17, 2023

owlet42 mentioned this issue Apr 26, 2023

Support fate v1.11.1 FederatedAI/FATE-Builder#30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

0kuang commented Mar 3, 2023 •

edited

Loading

zhihuiwan commented Mar 3, 2023

0kuang commented Mar 3, 2023

owlet42 commented Mar 3, 2023

0kuang commented Mar 3, 2023

zhihuiwan commented Mar 3, 2023

0kuang commented Mar 3, 2023

0kuang commented Mar 4, 2023

owlet42 commented Mar 6, 2023

0kuang commented Mar 6, 2023

owlet42 commented Mar 7, 2023

0kuang commented Mar 15, 2023

owlet42 commented Mar 16, 2023

0kuang commented Mar 23, 2023

owlet42 commented Mar 24, 2023

ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

Comments

0kuang commented Mar 3, 2023 • edited Loading

zhihuiwan commented Mar 3, 2023

0kuang commented Mar 3, 2023

owlet42 commented Mar 3, 2023

0kuang commented Mar 3, 2023

zhihuiwan commented Mar 3, 2023

0kuang commented Mar 3, 2023

0kuang commented Mar 4, 2023

owlet42 commented Mar 6, 2023

0kuang commented Mar 6, 2023

owlet42 commented Mar 7, 2023

0kuang commented Mar 15, 2023

owlet42 commented Mar 16, 2023

0kuang commented Mar 23, 2023

owlet42 commented Mar 24, 2023

0kuang commented Mar 3, 2023 •

edited

Loading