Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'federatedml' with docker-deploy #863

Open
0kuang opened this issue Mar 3, 2023 · 14 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@0kuang
Copy link

0kuang commented Mar 3, 2023

I deploy FATE following 使用Docker Compose 部署 FATE

After deployment, I use the following command to enter the client container:
docker exec -it confs-10000_client_1 bash

But when executing ./examples/benchmark_quality/homo_nn/fate-homo_nn.py the following error was reported:

Traceback (most recent call last):
  File "./fate-homo_nn.py", line 25, in <module>
    from federatedml.evaluation.metrics import classification_metric
ModuleNotFoundError: No module named 'federatedml'

How do I import the federatedml package in the client container?

Besides, I am a beginner and not familiar with the FATE framework. I would like to know how to use Python or jupyter to develop federated learning code in the case of docker deployment (for example, run the Resnet-example or building a custom dataset) instead of using the flow command.

Thanks!

@zhihuiwan
Copy link

environment needs to be imported before use:

source /data/projects/fate/bin/init_env.sh

@0kuang
Copy link
Author

0kuang commented Mar 3, 2023

root@bf1b603f8015:/data/projects/fate# cd bin
bash: cd: bin: No such file or directory

My FATE version is v1.10.0

It seems that there is no such script.

@zhihuiwan zhihuiwan transferred this issue from FederatedAI/FATE Mar 3, 2023
@owlet42
Copy link
Collaborator

owlet42 commented Mar 3, 2023

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

@owlet42 owlet42 added the bug Something isn't working label Mar 3, 2023
@owlet42 owlet42 self-assigned this Mar 3, 2023
@0kuang
Copy link
Author

0kuang commented Mar 3, 2023

How can I install these two packages manually?

@zhihuiwan
Copy link

You can try to set pythonpath and run it:

export PYTHONPATH=/data/projects/fate/fate/python

@0kuang
Copy link
Author

0kuang commented Mar 3, 2023

root@ff9d37a0afb0:/data/projects/fate# cd /data/projects/fate/fate/python
bash: cd: /data/projects/fate/fate/python: No such file or directory

It seems that in the client container, the federatedml & python related folders are missing.

@0kuang
Copy link
Author

0kuang commented Mar 4, 2023

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

@owlet42

Sorry to bother you again, is there a way for me to manually install federatedml? I hope to continue my studies.

Thanks.

@owlet42
Copy link
Collaborator

owlet42 commented Mar 6, 2023

I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.

@owlet42

Sorry to bother you again, is there a way for me to manually install federatedml? I hope to continue my studies.

Thanks.

@0kuang

A simple way is to add a volume mount for federatedml, and add the federatedml path to the PYTHONPATH environment variable.

图片

After I tried it, I found that there are other dependencies that need to be resolved.

@0kuang
Copy link
Author

0kuang commented Mar 6, 2023

I solved the dependency problem as you said:

  1. set the PYTHONPATH
  2. Clone the code of the missing package in the github repo
  3. Copy a service_conf.yaml

Now I have a new problem, a new error occurs when executing pipeline.fit():

ValueError: job submit failed, err msg: {'jobId': '202303062227458326320', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):
  File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit
    raise Exception("create job failed", response)
Exception: (\'create job failed\', {\'guest\': {9999: {\'retcode\': <RetCode.FEDERATED_ERROR: 104>, \'retmsg\': \'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\
\\tstatus = StatusCode.UNKNOWN\
\\tdetails = "\
[Roll Site Error TransInfo] \
 location msg=java.lang.String cannot be cast to java.lang.Integer \
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\
\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\
\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\
\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\
\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\
\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\
\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\
\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\
\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\
\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\
\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\
\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\
\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\
\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\
\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\
\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\
\\tat java.lang.Thread.run(Thread.java:750)\
 \
\
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
 location msg=java.lang.String cannot be cast to java.lang.Integer \\\
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\
\\\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\\\
\\\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\\\
\\\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\\\
\\\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\\
\\\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\\
\\\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\\
\\\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\\
\\\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\\
\\\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\\
\\\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\\
\\\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\\
\\\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\\
\\\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\\
\\\\tat java.lang.Thread.run(Thread.java:750)\\\
 \\\
\\\
exception trans path: rollsite(10000)","grpc_status":2}"\
>\'}}, \'host\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': True}, \'reader_1\': {\'need_run\': True}}}, \'retcode\': 0, \'retmsg\': \'success\'}}, \'arbiter\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': False}, \'reader_1\': {\'need_run\': False}}}, \'retcode\': 0, \'retmsg\': \'success\'}}})
'}

I think the key lies in the rollsite, I don't know if it is helpful for you to judge.

# key
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
 location msg=java.lang.String cannot be cast to java.lang.Integer \\\
 stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\

thank you for your reply~

@owlet42
Copy link
Collaborator

owlet42 commented Mar 7, 2023

Please make sure that all components of your FATE are working properly and can complete unilateral and multilateral toy tests.

flow test toy -gid 9999 -hid 9999    # unilateral
flow test toy -gid 9999 -hid 10000   # multilateral

@0kuang
Copy link
Author

0kuang commented Mar 15, 2023

I can now run the example code for Resnet with homo-nn correctly.

I would like to ask how to use GPU to accelerate training in FATE deployed by docker. Do you have any recommended tutorials?

In addition, which container will the task submitted through jupyter on confs_10000_client-1 eventually run on?

Thanks for your answer.

@owlet42
Copy link
Collaborator

owlet42 commented Mar 16, 2023

Currently does not support the deployment of GPU, the FATE task is mainly run in fateflow, the detailed process can refer to here https://federatedai.github.io/FATE-Flow/latest/fate_flow/

@0kuang
Copy link
Author

0kuang commented Mar 23, 2023

Which deployment method supports GPU?

The FedAvgTrainer in the FATE framework supports cuda=True. Is this parameter useful?

@owlet42
Copy link
Collaborator

owlet42 commented Mar 24, 2023

FedAvgTrainer has this configuration, and you can try setting cuda=True to use GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants